Adrien Grand created LUCENE-8618:
------------------------------------

             Summary: MMapDirectory's read ahead on random-access files might 
trash the OS cache
                 Key: LUCENE-8618
                 URL: https://issues.apache.org/jira/browse/LUCENE-8618
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Adrien Grand


At Elastic we were reported a case which runs significantly slower with 
MMapDirectory than with NIOFSDirectory. After a long analysis, we discovered 
that it had to do with MMapDirectory's read ahead of 2MB, which doesn't help 
and even trashes the OS cache on stored fields and term vectors files which 
have a fully random access pattern (except at merge time).

The particular use-case that exhibits the slow-down is performing updates, ie. 
we first look up a document based on its id, fetch stored fields, compute new 
stored fields (eg. after adding or changing the value of a field) and add the 
document back to the index. We were able to reproduce the workload that this 
Elasticsearch user described and measured a median throughput of 3600 updates/s 
with MMapDirectory and 5000 updates/s with NIOFSDirectory. It even goes up to 
5600 updates/s if you configure a FileSwitchDirectory to use MMapDirectory for 
the terms dictionary and NIOFSDirectory for stored fields (postings files are 
not relevant here since postings are inlined in the terms dict when docFreq=1 
and indexOptions=DOCS).

While it is possible to work around this issue on top of Lucene, maybe this is 
something that we could improve directly in Lucene, eg. by propagating 
information about the expected access pattern and avoiding mmap on files that 
have a fully random access pattern (until Java exposes madvise in some way)?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to