Adrien Grand created LUCENE-8618:
------------------------------------
Summary: MMapDirectory's read ahead on random-access files might
trash the OS cache
Key: LUCENE-8618
URL: https://issues.apache.org/jira/browse/LUCENE-8618
Project: Lucene - Core
Issue Type: Improvement
Reporter: Adrien Grand
At Elastic we were reported a case which runs significantly slower with
MMapDirectory than with NIOFSDirectory. After a long analysis, we discovered
that it had to do with MMapDirectory's read ahead of 2MB, which doesn't help
and even trashes the OS cache on stored fields and term vectors files which
have a fully random access pattern (except at merge time).
The particular use-case that exhibits the slow-down is performing updates, ie.
we first look up a document based on its id, fetch stored fields, compute new
stored fields (eg. after adding or changing the value of a field) and add the
document back to the index. We were able to reproduce the workload that this
Elasticsearch user described and measured a median throughput of 3600 updates/s
with MMapDirectory and 5000 updates/s with NIOFSDirectory. It even goes up to
5600 updates/s if you configure a FileSwitchDirectory to use MMapDirectory for
the terms dictionary and NIOFSDirectory for stored fields (postings files are
not relevant here since postings are inlined in the terms dict when docFreq=1
and indexOptions=DOCS).
While it is possible to work around this issue on top of Lucene, maybe this is
something that we could improve directly in Lucene, eg. by propagating
information about the expected access pattern and avoiding mmap on files that
have a fully random access pattern (until Java exposes madvise in some way)?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]