[ https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16760118#comment-16760118 ]
Ankit Jain commented on LUCENE-8635: ------------------------------------ I have created [pull request|https://github.com/apache/lucene-solr/pull/563] with the proposed changes. Though surprisingly, I still see some impact on the PKLookup performance. {code:title=wikimedium10m|borderStyle=solid} TaskQPS baseline StdDevQPS candidate StdDev Pct diff PKLookup 117.45 (2.2%) 108.72 (2.3%) -7.4% ( -11% - -3%) OrHighNotMed 1094.23 (2.5%) 1057.88 (2.7%) -3.3% ( -8% - 1%) OrHighNotLow 1047.30 (1.7%) 1012.91 (2.5%) -3.3% ( -7% - 1%) Fuzzy2 44.10 (2.3%) 42.71 (2.7%) -3.2% ( -7% - 1%) OrNotHighLow 1022.67 (2.5%) 992.28 (2.4%) -3.0% ( -7% - 1%) BrowseDayOfYearTaxoFacets 7907.19 (2.0%) 7677.99 (2.7%) -2.9% ( -7% - 1%) OrNotHighMed 866.37 (1.9%) 843.10 (2.3%) -2.7% ( -6% - 1%) LowTerm 2103.58 (3.5%) 2048.98 (3.6%) -2.6% ( -9% - 4%) BrowseMonthTaxoFacets 7883.86 (2.0%) 7692.48 (2.1%) -2.4% ( -6% - 1%) Fuzzy1 64.44 (1.9%) 62.88 (2.3%) -2.4% ( -6% - 1%) OrNotHighHigh 779.27 (2.0%) 761.04 (2.1%) -2.3% ( -6% - 1%) Respell 55.60 (2.6%) 54.34 (2.3%) -2.3% ( -7% - 2%) OrHighNotHigh 877.28 (2.2%) 858.10 (2.5%) -2.2% ( -6% - 2%) BrowseMonthSSDVFacets 14.85 (7.9%) 14.57 (10.7%) -1.9% ( -18% - 18%) MedTerm 1984.26 (3.6%) 1947.76 (2.3%) -1.8% ( -7% - 4%) AndHighLow 718.71 (1.5%) 706.06 (1.6%) -1.8% ( -4% - 1%) OrHighLow 523.40 (2.5%) 515.56 (2.4%) -1.5% ( -6% - 3%) HighTerm 1381.10 (2.9%) 1360.80 (2.7%) -1.5% ( -6% - 4%) HighTermMonthSort 120.45 (12.3%) 119.00 (16.4%) -1.2% ( -26% - 31%) BrowseDayOfYearSSDVFacets 11.55 (9.7%) 11.45 (10.0%) -0.8% ( -18% - 20%) AndHighMed 155.15 (2.6%) 154.25 (2.4%) -0.6% ( -5% - 4%) OrHighMed 88.00 (2.5%) 87.85 (2.7%) -0.2% ( -5% - 5%) LowPhrase 80.53 (1.6%) 80.40 (1.4%) -0.2% ( -3% - 2%) AndHighHigh 41.91 (4.2%) 41.86 (2.9%) -0.1% ( -6% - 7%) MedPhrase 46.29 (1.4%) 46.33 (1.5%) 0.1% ( -2% - 3%) IntNRQ 127.54 (0.4%) 127.76 (0.4%) 0.2% ( 0% - 1%) HighTermDayOfYearSort 48.59 (5.1%) 48.71 (6.0%) 0.2% ( -10% - 12%) LowSloppyPhrase 13.04 (4.0%) 13.08 (4.3%) 0.3% ( -7% - 8%) MedSloppyPhrase 19.48 (2.3%) 19.54 (2.4%) 0.3% ( -4% - 5%) OrHighHigh 23.60 (3.0%) 23.68 (2.9%) 0.3% ( -5% - 6%) HighPhrase 20.25 (2.4%) 20.32 (1.8%) 0.3% ( -3% - 4%) HighSloppyPhrase 9.29 (3.3%) 9.32 (3.2%) 0.4% ( -5% - 7%) LowSpanNear 25.70 (3.8%) 25.89 (3.9%) 0.7% ( -6% - 8%) MedSpanNear 30.46 (4.1%) 30.69 (4.3%) 0.7% ( -7% - 9%) HighSpanNear 14.41 (4.3%) 14.60 (4.7%) 1.3% ( -7% - 10%) Wildcard 70.08 (10.3%) 71.09 (6.1%) 1.4% ( -13% - 19%) BrowseDateTaxoFacets 2.37 (0.2%) 2.41 (0.3%) 1.5% ( 0% - 1%) Prefix3 86.71 (11.4%) 89.04 (6.8%) 2.7% ( -13% - 23%) {code} > Lazy loading Lucene FST offheap using mmap > ------------------------------------------ > > Key: LUCENE-8635 > URL: https://issues.apache.org/jira/browse/LUCENE-8635 > Project: Lucene - Core > Issue Type: New Feature > Components: core/FSTs > Environment: I used below setup for es_rally tests: > single node i3.xlarge running ES 6.5 > es_rally was running on another i3.xlarge instance > Reporter: Ankit Jain > Priority: Major > Attachments: fst-offheap-ra-rev.patch, fst-offheap-rev.patch, > offheap.patch, optional_offheap_ra.patch, ra.patch, rally_benchmark.xlsx > > > Currently, FST loads all the terms into heap memory during index open. This > causes frequent JVM OOM issues if the term size gets big. A better way of > doing this will be to lazily load FST using mmap. That ensures only the > required terms get loaded into memory. > > Lucene can expose API for providing list of fields to load terms offheap. I'm > planning to take following approach for this: > # Add a boolean property fstOffHeap in FieldInfo > # Pass list of offheap fields to lucene during index open (ALL can be > special keyword for loading ALL fields offheap) > # Initialize the fstOffHeap property during lucene index open > # FieldReader invokes default FST constructor or OffHeap constructor based > on fstOffHeap field > > I created a patch (that loads all fields offheap), did some benchmarks using > es_rally and results look good. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org