This also brings me another question:

does using MMap over FSDirectory bring any advantage with or without tmpfs?

Best regards


On 12/14/20 2:17 PM, Jigar Shah wrote:
Thanks, Uwe

Yes, recommended, tmpfs/ramfs worked like a charm in our use-case with a
read-only index, giving us very high-throughput and consistent response
time on queries.

We had to have some redundancy to be built around that service to be
high-available, so we can do a rolling update on the read-only index
reducing the risk of downtime.



On Mon, Dec 14, 2020 at 1:51 PM Uwe Schindler <u...@thetaphi.de> wrote:

Hi,

as writer of the original bog post, here my comments:

Yes, MMapDirectory.setPreload() is the feature mentioned in my blog post is
to load everything into memory - but that does not guarantee anything!
Still, I would not recommend to use that function, because all it does is
to
just touch every page of the file, so the linux kernel puts it into OS
cache
- nothing more; IMHO very ineffective as it slows down openining index for
a
stupid for-each-page-touch-loop. It will do this with EVERY page, if it is
later used or not! So this may take some time until it is done. Lateron,
still Lucene needs to open index files, initialize its own data
structures,...

In general it is much better to open index, with MMAP directory and execute
some "sample" queries. This will do exactly the same like the preload
function, but it is more "selective". Parts of the index which are not used
won't be touched, and on top, it will also load ALL the required index
structures to heap.

As always and as mentioned in my blog post: there's nothing that can ensure
your index will stays in memory. Please trust the kernel to do the right
thing. Why do you care at all?

If you are curious and want to have everything in memory all the time:
- use tmpfs as your filesystem (of course you will loose data when OS shuts
down)
- disable swap and/or disable swapiness
- use only as much heap as needed, keep everything of free memory for your
index outside heap.

Fake feelings of "everything in RAM" are misconceptions like:
- use RAMDirectory (deprecated): this may be a desaster as it described in
the blog post
- use ByteBuffersDirectory: a little bit better, but this brings nothing,
as
the operating system kernel may still page out your index pages. They still
live in/off heap and are part of usual paging. They are just no longer
backed by a file.

Lucene does most of the stuff outside heap, live with it!

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://urldefense.com/v3/__https://www.thetaphi.de__;!!GqivPVa7Brio!MAgLdznjSB6VCUW53bxfBB8GANAgHBAQCr4Jl4NIxTNKYeLlRtOl1TtPJMV80mkA-w$
eMail: u...@thetaphi.de

-----Original Message-----
From: baris.ka...@oracle.com <baris.ka...@oracle.com>
Sent: Sunday, December 13, 2020 10:18 PM
To: java-user@lucene.apache.org
Cc: BARIS KAZAR <baris.ka...@oracle.com>
Subject: MMapDirectory vs In Memory Lucene Index (i.e.,
ByteBuffersDirectory)
Hi,-

it would be nice to create a Lucene index in files and then effectively
load it
into memory once (since i use in read-only mode). I am looking into if
this is
doable in Lucene.

i wish there were an option to load whole Lucene index into memory:

Both of below urls have links to the blog url where i quoted a very nice
section:
https://urldefense.com/v3/__https://lucene.apache.org/core/8_5_0/core/org/apache/lucene/store/MMapDi__;!!GqivPVa7Brio!MAgLdznjSB6VCUW53bxfBB8GANAgHBAQCr4Jl4NIxTNKYeLlRtOl1TtPJMXBLamTEw$
rectory.html
https://urldefense.com/v3/__https://lucene.apache.org/core/8_5_2/core/org/apache/lucene/store/MMapDi__;!!GqivPVa7Brio!MAgLdznjSB6VCUW53bxfBB8GANAgHBAQCr4Jl4NIxTNKYeLlRtOl1TtPJMV5-KIYlg$
rectory.html

This following blog mentions about such option
to run in the memory: (see the underlined sentence below)

https://urldefense.com/v3/__https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-__;!!GqivPVa7Brio!MAgLdznjSB6VCUW53bxfBB8GANAgHBAQCr4Jl4NIxTNKYeLlRtOl1TtPJMXkDOv-_A$
64bit.html?m=1

MMapDirectory will not load the whole index into physical memory. Why
should it do this? We just ask the operating system to map the file into
address
space for easy access, by no means we are requesting more. Java and the
O/S
optionally provide the option to try loading the whole file into RAM (if
enough
is available), but Lucene does not use that option (we may add this
possibility
in a later version).

My question is: is there such an option?
is the method setPreLoad for this purpose:
to load all Lucene lndex into memory?

I would like to use MMapDirectory and set my
JVM heap to 16G or a bit less (since my index is
around this much).

The Lucene 8.5.2 (8.5.0 as well) javadocs say:
public void setPreload(boolean preload)
Set to true to ask mapped pages to be loaded into physical memory on
init.
The
behavior is best-effort and operating system dependent.

For example Lucene 4.0.0 does not have setPreLoad method.

https://urldefense.com/v3/__https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/store/MMapDi__;!!GqivPVa7Brio!MAgLdznjSB6VCUW53bxfBB8GANAgHBAQCr4Jl4NIxTNKYeLlRtOl1TtPJMWIjjLhuw$
rectory.html

Happy Holidays
Best regards


Ps. i know there is also BytesBuffersDirectory class for in memory Lucene
but
this requires creating Lucene Index on the fly.

This is great for only such kind of Lucene indexes that can be created
quickly on
the fly.

Ekaterina has a nice article on this BytesBuffersDirectory class:

https://urldefense.com/v3/__https://medium.com/@ekaterinamihailova/in-memory-search-and-__;!!GqivPVa7Brio!MAgLdznjSB6VCUW53bxfBB8GANAgHBAQCr4Jl4NIxTNKYeLlRtOl1TtPJMUCxw4qlA$
autocomplete-with-lucene-8-5-f2df1bc71c36


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to