[ 
https://issues.apache.org/jira/browse/LUCENE-8618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16748139#comment-16748139
 ] 

Adrien Grand commented on LUCENE-8618:
--------------------------------------

This index was hot in terms of usage but much larger than the filesystem cache 
so the filesystem cache could not hold everything in memory.

> MMapDirectory's read ahead on random-access files might trash the OS cache
> --------------------------------------------------------------------------
>
>                 Key: LUCENE-8618
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8618
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> At Elastic we were reported a case which runs significantly slower with 
> MMapDirectory than with NIOFSDirectory. After a long analysis, we discovered 
> that it had to do with MMapDirectory's read ahead of 2MB, which doesn't help 
> and even trashes the OS cache on stored fields and term vectors files which 
> have a fully random access pattern (except at merge time).
> The particular use-case that exhibits the slow-down is performing updates, 
> ie. we first look up a document based on its id, fetch stored fields, compute 
> new stored fields (eg. after adding or changing the value of a field) and add 
> the document back to the index. We were able to reproduce the workload that 
> this Elasticsearch user described and measured a median throughput of 3600 
> updates/s with MMapDirectory and 5000 updates/s with NIOFSDirectory. It even 
> goes up to 5600 updates/s if you configure a FileSwitchDirectory to use 
> MMapDirectory for the terms dictionary and NIOFSDirectory for stored fields 
> (postings files are not relevant here since postings are inlined in the terms 
> dict when docFreq=1 and indexOptions=DOCS).
> While it is possible to work around this issue on top of Lucene, maybe this 
> is something that we could improve directly in Lucene, eg. by propagating 
> information about the expected access pattern and avoiding mmap on files that 
> have a fully random access pattern (until Java exposes madvise in some way)?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to