So, just cat <Lucene_index_file> will do this.
Thanks
________________________________
From: Robert Muir <rcm...@gmail.com>
Sent: Tuesday, February 23, 2021 4:45 PM
To: Baris Kazar <baris.ka...@oracle.com>
Cc: java-user <java-user@lucene.apache.org>
Subject: Re: MMapDirectory vs In Memory Lucene Index (i.e., 
ByteBuffersDirectory)

The preload isn't magical.
It only "reads in the whole file" to get it cached, same as if you did that 
yourself with 'cat' or 'dd'.
It "warms" the file.

It just does this in an efficient way at the low level to make the warming 
itself efficient. It madvise()s kernel to announce some read-ahead and then 
reads the first byte of every mmap'd page (which is enough to fault it in).

At the end of the day it doesn't matter if you wrote a shitty shell script that 
uses 'dd' to read in each index file and send it to /dev/null, or whether you 
spent lots of time writing fancy java code to call this preload thing: you get 
the same result, same end state.

Maybe the preload takes 18 seconds to "warm" the index, vs. your crappy shell 
script which takes 22 seconds. It is mainly more important for servers and 
portability (e.g. it will work fine on windows, but obviously will not call 
madvise).

On Tue, Feb 23, 2021 at 4:18 PM 
<baris.ka...@oracle.com<mailto:baris.ka...@oracle.com>> wrote:

Thanks again, Robert. Could you please explain "preload"? Which functionality 
is that? we discussed in this thread before about a preload.

Is there a Lucene url / site that i can look at for preload?

Thanks for the explanations. This thread will be useful for many folks i 
believe.

Best regards


On 2/23/21 4:15 PM, Robert Muir wrote:


On Tue, Feb 23, 2021 at 4:07 PM 
<baris.ka...@oracle.com<mailto:baris.ka...@oracle.com>> wrote:

What i want to achieve: Problem statement:

base case is disk based Lucene index with FSDirectory

speedup case was supposed to be in memory Lucene index with MMapDirectory

On 64-bit systems, FSDirectory just invokes MMapDirectory already. So you don't 
need to do anything.

Either way MMapDirectory or NIOFSDirectory are doing the same thing: reading 
your index as a normal file and letting the operating system cache it.
The MMapDirectory is just better because it avoids some overheads, such as 
read() system call, copying and buffering into java memory space, etc etc.
Some of these overheads are only getting worse, e.g. spectre/meltdown-type 
fixes make syscalls 8x slower on my computer. So it is good that MMapDirectory 
avoids it.

So I suggest just stop fighting the operating system, don't give your J2EE 
container huge amounts of ram, let the kernel do its job.
If you want to "warm" a cold system because nothing is in kernel's cache, then 
look into preload and so on. It is just "reading files" to get them cached.

Reply via email to