[ https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753468#comment-16753468 ]
Mike Sokolov commented on LUCENE-8635: -------------------------------------- I tried that [~akjain] and strangely got a big drop in performance! I just used a wrapper around {{IndexInput}} rather than the random access approach (using {{randomAccessSlice}}) and implemented {{skipBytes}} in the obvious way: by calling the delegate's {{skipBytes}}. But this is bad. The default implementation of that method comes from {{DataInput}} and that actually reads bytes into a buffer rather than simply updating a pointer. I'm not sure I understand the rationale for that - it seems to have to do with checksumming? Possibly {{ByteBuffer(s)IndexInput}} could (should?) implement this more efficiently, or maybe it's required to do this reading -- not sure. At any rate I think in this case we really just want to seek the pointer, so we can have our {{FST.BytesReader.skipBytes}} call {{IndexInput.seek}} instead of {{IndexInput.skipBytes}}. > Lazy loading Lucene FST offheap using mmap > ------------------------------------------ > > Key: LUCENE-8635 > URL: https://issues.apache.org/jira/browse/LUCENE-8635 > Project: Lucene - Core > Issue Type: New Feature > Components: core/FSTs > Environment: I used below setup for es_rally tests: > single node i3.xlarge running ES 6.5 > es_rally was running on another i3.xlarge instance > Reporter: Ankit Jain > Priority: Major > Attachments: fst-offheap-ra-rev.patch, offheap.patch, ra.patch, > rally_benchmark.xlsx > > > Currently, FST loads all the terms into heap memory during index open. This > causes frequent JVM OOM issues if the term size gets big. A better way of > doing this will be to lazily load FST using mmap. That ensures only the > required terms get loaded into memory. > > Lucene can expose API for providing list of fields to load terms offheap. I'm > planning to take following approach for this: > # Add a boolean property fstOffHeap in FieldInfo > # Pass list of offheap fields to lucene during index open (ALL can be > special keyword for loading ALL fields offheap) > # Initialize the fstOffHeap property during lucene index open > # FieldReader invokes default FST constructor or OffHeap constructor based > on fstOffHeap field > > I created a patch (that loads all fields offheap), did some benchmarks using > es_rally and results look good. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org