[ https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756098#comment-16756098 ]
Mike Sokolov edited comment on LUCENE-8635 at 1/30/19 1:24 PM: --------------------------------------------------------------- I agree that would be a good start. Perhaps as a separate issue we can add finer per-field control of when to use on vs off-heap (per field, eg). Just to look a little way down that path: It seems that the nearest thing to do this today is {{get/setPreload()}} and {{get/setUseUnmap}} in {{MMapDirectory}}, but here one really wants a mapping by field name, and a Directory should not really bne concerned with field names. Better would be an attribute of {{FieldInfo}}, where we have {{put/getAttribute}}. Then {{FieldReader}} can inspect the {{FieldInfo}} and pass the appropriate {{On/OffHeapStore}} when creating its {{FST}}. What do you think? was (Author: sokolov): I agree that would be a good start. Perhaps as a separate issue we can add finer per-field control of when to use on vs off-heap (per field, eg). Just to look a little way down that path: It seems that the nearest thing to do this today is {{get/setPreload()}} and {{get/setUseUnmap}} in {{MMapDirectory}}, but here one really wants a mapping by field name, and a Directory should not really bne concerned with field names. Better would be an attribute of {{FieldInfo}}, where we have {{put/getAttribute}}. Then {{FieldReader}} can inspect the {{FieldInfo}} and pass the appropriate {{On/OffHeapStore}} when creating its {{FST}}. > Lazy loading Lucene FST offheap using mmap > ------------------------------------------ > > Key: LUCENE-8635 > URL: https://issues.apache.org/jira/browse/LUCENE-8635 > Project: Lucene - Core > Issue Type: New Feature > Components: core/FSTs > Environment: I used below setup for es_rally tests: > single node i3.xlarge running ES 6.5 > es_rally was running on another i3.xlarge instance > Reporter: Ankit Jain > Priority: Major > Attachments: fst-offheap-ra-rev.patch, fst-offheap-rev.patch, > offheap.patch, optional_offheap_ra.patch, ra.patch, rally_benchmark.xlsx > > > Currently, FST loads all the terms into heap memory during index open. This > causes frequent JVM OOM issues if the term size gets big. A better way of > doing this will be to lazily load FST using mmap. That ensures only the > required terms get loaded into memory. > > Lucene can expose API for providing list of fields to load terms offheap. I'm > planning to take following approach for this: > # Add a boolean property fstOffHeap in FieldInfo > # Pass list of offheap fields to lucene during index open (ALL can be > special keyword for loading ALL fields offheap) > # Initialize the fstOffHeap property during lucene index open > # FieldReader invokes default FST constructor or OffHeap constructor based > on fstOffHeap field > > I created a patch (that loads all fields offheap), did some benchmarks using > es_rally and results look good. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org