[jira] [Commented] (LUCENE-8618) MMapDirectory's read ahead on random-access files might trash the OS cache

Robert Muir (JIRA) Fri, 21 Dec 2018 06:08:08 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-8618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726761#comment-16726761
 ]


Robert Muir commented on LUCENE-8618:
-------------------------------------

{quote}
we first look up a document based on its id, fetch stored fields, compute new 
stored fields (eg. after adding or changing the value of a field) and add the 
document back to the index.
{quote}

I don't think we should make things complicated to optimize for this.

> MMapDirectory's read ahead on random-access files might trash the OS cache
> --------------------------------------------------------------------------
>
>                 Key: LUCENE-8618
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8618
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> At Elastic we were reported a case which runs significantly slower with 
> MMapDirectory than with NIOFSDirectory. After a long analysis, we discovered 
> that it had to do with MMapDirectory's read ahead of 2MB, which doesn't help 
> and even trashes the OS cache on stored fields and term vectors files which 
> have a fully random access pattern (except at merge time).
> The particular use-case that exhibits the slow-down is performing updates, 
> ie. we first look up a document based on its id, fetch stored fields, compute 
> new stored fields (eg. after adding or changing the value of a field) and add 
> the document back to the index. We were able to reproduce the workload that 
> this Elasticsearch user described and measured a median throughput of 3600 
> updates/s with MMapDirectory and 5000 updates/s with NIOFSDirectory. It even 
> goes up to 5600 updates/s if you configure a FileSwitchDirectory to use 
> MMapDirectory for the terms dictionary and NIOFSDirectory for stored fields 
> (postings files are not relevant here since postings are inlined in the terms 
> dict when docFreq=1 and indexOptions=DOCS).
> While it is possible to work around this issue on top of Lucene, maybe this 
> is something that we could improve directly in Lucene, eg. by propagating 
> information about the expected access pattern and avoiding mmap on files that 
> have a fully random access pattern (until Java exposes madvise in some way)?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-8618) MMapDirectory's read ahead on random-access files might trash the OS cache

Reply via email to