[jira] [Commented] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

Ankit Jain (JIRA) Tue, 15 Jan 2019 19:05:30 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743577#comment-16743577
 ]


Ankit Jain commented on LUCENE-8635:
------------------------------------

 Rally tests use underlying elasticsearch cluster which use cases other than 
search like log analytics. I ran 1 iteration for multiple data sets and did not 
notice significant performance degradations. Rather, I noticed 6% improvement 
in indexing throughput for all the data sets. Though, I should leave it running 
for more iterations, to get more conclusive evidence.

Thanks [~sokolov] for testing the changes. I think the impact is as expected, 
maybe slightly more for the PKLookup. Do the tests use randomized key for each 
PKLookup query or the keys are reused across queries? That will impact the 
overall throughput as mmap is inherently lazily loaded.

Though, I'm open to exposing per field setting in Lucene, I agree with 
[~dsmiley] about 25% reduction in throughput being tiny fraction of typical 
usage. And, throughput should be better if same keys get used for PKLookup 
queries. Adding per field setting might require code change and will be 
effective only for data indexed using new codec. My knowledge of Lucene 
settings is limited and I might be wrong.

> Lazy loading Lucene FST offheap using mmap
> ------------------------------------------
>
>                 Key: LUCENE-8635
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8635
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/FSTs
>         Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>            Reporter: Ankit Jain
>            Priority: Major
>         Attachments: offheap.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This 
> causes frequent JVM OOM issues if the term size gets big. A better way of 
> doing this will be to lazily load FST using mmap. That ensures only the 
> required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm 
> planning to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be 
> special keyword for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based 
> on fstOffHeap field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using 
> es_rally and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

Reply via email to