[
https://issues.apache.org/jira/browse/LUCENE-8618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726761#comment-16726761
]
Robert Muir commented on LUCENE-8618:
-------------------------------------
{quote}
we first look up a document based on its id, fetch stored fields, compute new
stored fields (eg. after adding or changing the value of a field) and add the
document back to the index.
{quote}
I don't think we should make things complicated to optimize for this.
> MMapDirectory's read ahead on random-access files might trash the OS cache
> --------------------------------------------------------------------------
>
> Key: LUCENE-8618
> URL: https://issues.apache.org/jira/browse/LUCENE-8618
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
>
> At Elastic we were reported a case which runs significantly slower with
> MMapDirectory than with NIOFSDirectory. After a long analysis, we discovered
> that it had to do with MMapDirectory's read ahead of 2MB, which doesn't help
> and even trashes the OS cache on stored fields and term vectors files which
> have a fully random access pattern (except at merge time).
> The particular use-case that exhibits the slow-down is performing updates,
> ie. we first look up a document based on its id, fetch stored fields, compute
> new stored fields (eg. after adding or changing the value of a field) and add
> the document back to the index. We were able to reproduce the workload that
> this Elasticsearch user described and measured a median throughput of 3600
> updates/s with MMapDirectory and 5000 updates/s with NIOFSDirectory. It even
> goes up to 5600 updates/s if you configure a FileSwitchDirectory to use
> MMapDirectory for the terms dictionary and NIOFSDirectory for stored fields
> (postings files are not relevant here since postings are inlined in the terms
> dict when docFreq=1 and indexOptions=DOCS).
> While it is possible to work around this issue on top of Lucene, maybe this
> is something that we could improve directly in Lucene, eg. by propagating
> information about the expected access pattern and avoiding mmap on files that
> have a fully random access pattern (until Java exposes madvise in some way)?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]