[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743730#comment-16743730
 ] 

Adrien Grand commented on LUCENE-8635:
--------------------------------------

This is pretty cool. I'm happily surprised as well of how small the patch is.

bq. Do the tests use randomized key for each PKLookup query or the keys are 
reused across queries?

It uses random keys:
https://github.com/mikemccand/luceneutil/blob/7d3ee97a4349c300d399fd83fb11febdf4607f44/src/main/perf/PKLookupTask.java

bq. Adding per field setting might require code change and will be effective 
only for data indexed using new codec.

Technically we could make things work for existing segments since your patch 
doesn't change the file format.

In general I'm supportive of moving as much as we can to disk and relying on 
the OS cache to load important stuff in memory and keep the rest on disk. The 
thing that makes me want to be careful here is that access to the terms index 
is very random, so things might degrade badly if the OS cache doesn't hold the 
whole terms index in memory. I'm not super familiar with the FST internals, I 
wonder whether there are changes that we could make to it so that it would be 
more disk-friendly, eg. by seeking backward as little as possible when looking 
up a key?

> Lazy loading Lucene FST offheap using mmap
> ------------------------------------------
>
>                 Key: LUCENE-8635
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8635
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/FSTs
>         Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>            Reporter: Ankit Jain
>            Priority: Major
>         Attachments: offheap.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This 
> causes frequent JVM OOM issues if the term size gets big. A better way of 
> doing this will be to lazily load FST using mmap. That ensures only the 
> required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm 
> planning to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be 
> special keyword for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based 
> on fstOffHeap field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using 
> es_rally and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to