I used the wikimedia2m data set for the second set of tests (the first test was on a tiny index - 10k docs) -- at least I think I did! I am kind of new to the benchmarking game. I ran the becnhmarks with python src/python/localrun.py -source wikimedium2m, and I can see that the index dir is 861M.
On Wed, Jan 16, 2019 at 7:18 PM Michael McCandless (JIRA) <j...@apache.org> wrote: > > [ > https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16744538#comment-16744538 > ] > > Michael McCandless commented on LUCENE-8635: > -------------------------------------------- > > Thanks [~sokolov] – those numbers look quite a bit better! Though, your > QPSs are kinda high overall – how many Wikipedia docs were in your index? > > I do wonder if we simply reversed the FST's byte[] when we create it, what > impact that'd have on lookup performance. Hmm even if we did that, we'd > still have to {{readBytes}} one byte at a time since {{RandomAccessInput}} > does not have a {{readBytes}} method? But ... maybe {{IndexInput}} would > give good performance in that case? We should probably pursue that > separately though... > > > Lazy loading Lucene FST offheap using mmap > > ------------------------------------------ > > > > Key: LUCENE-8635 > > URL: https://issues.apache.org/jira/browse/LUCENE-8635 > > Project: Lucene - Core > > Issue Type: New Feature > > Components: core/FSTs > > Environment: I used below setup for es_rally tests: > > single node i3.xlarge running ES 6.5 > > es_rally was running on another i3.xlarge instance > > Reporter: Ankit Jain > > Priority: Major > > Attachments: offheap.patch, ra.patch, rally_benchmark.xlsx > > > > > > Currently, FST loads all the terms into heap memory during index open. > This causes frequent JVM OOM issues if the term size gets big. A better way > of doing this will be to lazily load FST using mmap. That ensures only the > required terms get loaded into memory. > > > > Lucene can expose API for providing list of fields to load terms > offheap. I'm planning to take following approach for this: > > # Add a boolean property fstOffHeap in FieldInfo > > # Pass list of offheap fields to lucene during index open (ALL can be > special keyword for loading ALL fields offheap) > > # Initialize the fstOffHeap property during lucene index open > > # FieldReader invokes default FST constructor or OffHeap constructor > based on fstOffHeap field > > > > I created a patch (that loads all fields offheap), did some benchmarks > using es_rally and results look good. > > > > -- > This message was sent by Atlassian JIRA > (v7.6.3#76005) > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >