[ https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755344#comment-16755344 ]
Michael McCandless commented on LUCENE-8635: -------------------------------------------- OK net/net it looks like there is a small performance impact for some queries, and biggish (-7-8%) impact for {{PKLookup.}} But this is a nice option to have for users who are heap constrained by the FSTs, so I wonder how we could add this option off by default? E.g. users might want their {{id}} field to store the FST in heap (like today), but all other fields off-heap. There is no index format change required here, which is nice, but Lucene doesn't make it easy to have read-time codec behavior changes, so maybe the solution is that at write-time we add an option e.g. to {{BlockTreeTermsWriter}} and it stores this in the index and then at read-time {{BlockTreeTermsReader}} checks that option and loads the FST accordingly? Then users could customize their codecs to achieve this. Or I suppose we could add a global system property, e.g. our default stored fields writer has a property to turn on/off bulk merge, but I think we are trying not to use Java properties going forward? Can anyone think of any other approaches to make this option possible? > Lazy loading Lucene FST offheap using mmap > ------------------------------------------ > > Key: LUCENE-8635 > URL: https://issues.apache.org/jira/browse/LUCENE-8635 > Project: Lucene - Core > Issue Type: New Feature > Components: core/FSTs > Environment: I used below setup for es_rally tests: > single node i3.xlarge running ES 6.5 > es_rally was running on another i3.xlarge instance > Reporter: Ankit Jain > Priority: Major > Attachments: fst-offheap-ra-rev.patch, offheap.patch, > optional_offheap_ra.patch, ra.patch, rally_benchmark.xlsx > > > Currently, FST loads all the terms into heap memory during index open. This > causes frequent JVM OOM issues if the term size gets big. A better way of > doing this will be to lazily load FST using mmap. That ensures only the > required terms get loaded into memory. > > Lucene can expose API for providing list of fields to load terms offheap. I'm > planning to take following approach for this: > # Add a boolean property fstOffHeap in FieldInfo > # Pass list of offheap fields to lucene during index open (ALL can be > special keyword for loading ALL fields offheap) > # Initialize the fstOffHeap property during lucene index open > # FieldReader invokes default FST constructor or OffHeap constructor based > on fstOffHeap field > > I created a patch (that loads all fields offheap), did some benchmarks using > es_rally and results look good. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org