Hi Alexander,
I understand that NIOFSDirectory also uses the FS cache, but doesn't
MMapDirectory tend to fill up the cache with unnecessary data for
random access pattern due to sequential read-ahead? Our concern is
that it can potentially lead to evicting hot pages used by another
process on the same host, affecting its performance.
no this is not the case (at least not on Linux or Solaris). It is no
difference between read() and a pagefault by mmap. It will read the same
pages and put them into cache. It won't read more pages for mmap. What
gets read depends on fadvice or madvise, which Lucene does not change
(the OS decides).
As far as I can Elasticsearch also avoids using MMap for everything by
default, e.g stored fields and term vectors are not MMAPed.
The Elasticsearch reason is different: It does it because of the limited
number of mappings available by current kernels. Elasticsearch clusters
tend to have many indexes and to avoid too many mappings they do this.
It has nothing to do with caching.
stored fields and term vectors are valid candidates to not mmapping them
if you have pressure on number of mappings. The access pattern is
completely different. So what Elasticserach does is a valid thing to do.
If you really want to spare mappings, use the stored fields / term
vectors approach. But then you also need to disable CFS files which is
contra-productive, as it raises the number of mappings and file handles.
Does it make sense or am I missing something? Is my understanding
correct that it still makes sense to avoid MMAPing files with the
random access pattern on the most recent Lucene and JVM versions?
Who said this? This is simply not true! Myths....
One last word: With the next Lucene version after Java 19 came out you
will be able to work around the "too many mappings" problem for huge
clouds of Elasticsearch clusters due to a new MMAP implementation
choosen using MultiRelease lucene-core.jar file. This will allow them to
mmap everything when Java 19+ is used (and the preview features of Java
are enabled). This works by having huger blocks of virtual memory
(currently limited to 1 Gigabyte per mapping) =>
https://github.com/apache/lucene/pull/912
Uwe
Thank you,
Alex
On Fri, Aug 19, 2022 at 2:42 AM Robert Muir <rcm...@gmail.com> wrote:
On Thu, Aug 18, 2022 at 1:47 PM Alexander Lukyanchikov
<alexanderlukyanchi...@gmail.com> wrote:
>
> Currently we are trying to avoid switching to MMAP because there
is another process running on the same host and extensively
utilizes the FS cache.
>
This makes no sense, NIOFSDirectory uses the FS cache the exact same
way as mmap. it just uses read() interface instead.
A self-created problem!
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:u...@thetaphi.de