[ https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753609#comment-16753609 ]
Ankit Jain edited comment on LUCENE-8635 at 1/27/19 10:14 PM: -------------------------------------------------------------- Results for bigger data sets: {code:title=wikimedium10m, java ...... -DFST.offheap=true|borderStyle=solid} TaskQPS baseline StdDevQPS candidate StdDev Pct diff PKLookup 117.59 (3.0%) 107.48 (2.3%) -8.6% ( -13% - -3%) OrHighNotMed 1085.05 (2.1%) 1056.43 (2.2%) -2.6% ( -6% - 1%) OrNotHighLow 976.94 (2.4%) 955.32 (1.8%) -2.2% ( -6% - 2%) OrHighNotLow 1152.58 (2.6%) 1128.25 (2.0%) -2.1% ( -6% - 2%) Fuzzy1 83.10 (2.6%) 81.54 (2.5%) -1.9% ( -6% - 3%) IntNRQ 88.53 (16.2%) 86.92 (14.7%) -1.8% ( -28% - 34%) OrNotHighHigh 886.10 (1.7%) 870.26 (1.4%) -1.8% ( -4% - 1%) OrHighNotHigh 838.32 (1.8%) 824.15 (1.9%) -1.7% ( -5% - 2%) BrowseMonthTaxoFacets 8099.58 (2.0%) 7968.65 (1.8%) -1.6% ( -5% - 2%) Fuzzy2 55.95 (2.7%) 55.08 (2.5%) -1.6% ( -6% - 3%) OrNotHighMed 764.40 (2.3%) 752.56 (1.7%) -1.5% ( -5% - 2%) BrowseDayOfYearTaxoFacets 8081.37 (2.1%) 7957.27 (2.7%) -1.5% ( -6% - 3%) LowTerm 1941.88 (5.2%) 1912.71 (4.0%) -1.5% ( -10% - 8%) HighTermMonthSort 78.12 (10.8%) 76.99 (14.3%) -1.4% ( -23% - 26%) Respell 61.23 (2.7%) 60.57 (2.7%) -1.1% ( -6% - 4%) HighTerm 1526.16 (3.1%) 1510.23 (1.8%) -1.0% ( -5% - 4%) MedTerm 1814.44 (3.7%) 1797.69 (2.1%) -0.9% ( -6% - 5%) OrHighLow 443.93 (2.4%) 439.92 (2.5%) -0.9% ( -5% - 4%) AndHighLow 577.60 (2.0%) 573.43 (1.4%) -0.7% ( -4% - 2%) Wildcard 62.79 (5.8%) 62.54 (6.1%) -0.4% ( -11% - 12%) BrowseDayOfYearSSDVFacets 11.56 (8.0%) 11.55 (8.2%) -0.0% ( -15% - 17%) Prefix3 165.76 (8.7%) 165.70 (9.2%) -0.0% ( -16% - 19%) MedSpanNear 51.40 (2.3%) 51.48 (2.5%) 0.2% ( -4% - 5%) BrowseMonthSSDVFacets 14.45 (13.6%) 14.47 (13.2%) 0.2% ( -23% - 31%) HighTermDayOfYearSort 44.98 (6.8%) 45.05 (5.3%) 0.2% ( -11% - 13%) OrHighMed 111.81 (3.0%) 112.01 (2.8%) 0.2% ( -5% - 6%) LowSpanNear 47.14 (2.4%) 47.24 (2.5%) 0.2% ( -4% - 5%) MedSloppyPhrase 48.25 (1.9%) 48.37 (2.3%) 0.2% ( -3% - 4%) LowSloppyPhrase 35.36 (2.2%) 35.46 (2.5%) 0.3% ( -4% - 5%) AndHighMed 144.05 (3.6%) 144.53 (2.7%) 0.3% ( -5% - 6%) HighSpanNear 6.92 (3.5%) 6.95 (3.5%) 0.5% ( -6% - 7%) MedPhrase 25.88 (2.4%) 26.00 (1.4%) 0.5% ( -3% - 4%) AndHighHigh 38.77 (4.0%) 38.98 (3.9%) 0.5% ( -7% - 8%) OrHighHigh 27.47 (3.2%) 27.63 (3.1%) 0.6% ( -5% - 7%) LowPhrase 91.71 (4.3%) 92.56 (3.5%) 0.9% ( -6% - 9%) HighSloppyPhrase 18.28 (3.2%) 18.45 (3.6%) 0.9% ( -5% - 8%) HighPhrase 20.07 (3.9%) 20.35 (1.3%) 1.4% ( -3% - 6%) BrowseDateTaxoFacets 2.37 (0.4%) 2.41 (0.2%) 1.4% ( 0% - 2%) {code} was (Author: akjain): Results for bigger data sets: {code| title=wikimedium10m, java ...... -DFST.offheap=true|borderStyle=solid} TaskQPS baseline StdDevQPS candidate StdDev Pct diff PKLookup 117.59 (3.0%) 107.48 (2.3%) -8.6% ( -13% - -3%) OrHighNotMed 1085.05 (2.1%) 1056.43 (2.2%) -2.6% ( -6% - 1%) OrNotHighLow 976.94 (2.4%) 955.32 (1.8%) -2.2% ( -6% - 2%) OrHighNotLow 1152.58 (2.6%) 1128.25 (2.0%) -2.1% ( -6% - 2%) Fuzzy1 83.10 (2.6%) 81.54 (2.5%) -1.9% ( -6% - 3%) IntNRQ 88.53 (16.2%) 86.92 (14.7%) -1.8% ( -28% - 34%) OrNotHighHigh 886.10 (1.7%) 870.26 (1.4%) -1.8% ( -4% - 1%) OrHighNotHigh 838.32 (1.8%) 824.15 (1.9%) -1.7% ( -5% - 2%) BrowseMonthTaxoFacets 8099.58 (2.0%) 7968.65 (1.8%) -1.6% ( -5% - 2%) Fuzzy2 55.95 (2.7%) 55.08 (2.5%) -1.6% ( -6% - 3%) OrNotHighMed 764.40 (2.3%) 752.56 (1.7%) -1.5% ( -5% - 2%) BrowseDayOfYearTaxoFacets 8081.37 (2.1%) 7957.27 (2.7%) -1.5% ( -6% - 3%) LowTerm 1941.88 (5.2%) 1912.71 (4.0%) -1.5% ( -10% - 8%) HighTermMonthSort 78.12 (10.8%) 76.99 (14.3%) -1.4% ( -23% - 26%) Respell 61.23 (2.7%) 60.57 (2.7%) -1.1% ( -6% - 4%) HighTerm 1526.16 (3.1%) 1510.23 (1.8%) -1.0% ( -5% - 4%) MedTerm 1814.44 (3.7%) 1797.69 (2.1%) -0.9% ( -6% - 5%) OrHighLow 443.93 (2.4%) 439.92 (2.5%) -0.9% ( -5% - 4%) AndHighLow 577.60 (2.0%) 573.43 (1.4%) -0.7% ( -4% - 2%) Wildcard 62.79 (5.8%) 62.54 (6.1%) -0.4% ( -11% - 12%) BrowseDayOfYearSSDVFacets 11.56 (8.0%) 11.55 (8.2%) -0.0% ( -15% - 17%) Prefix3 165.76 (8.7%) 165.70 (9.2%) -0.0% ( -16% - 19%) MedSpanNear 51.40 (2.3%) 51.48 (2.5%) 0.2% ( -4% - 5%) BrowseMonthSSDVFacets 14.45 (13.6%) 14.47 (13.2%) 0.2% ( -23% - 31%) HighTermDayOfYearSort 44.98 (6.8%) 45.05 (5.3%) 0.2% ( -11% - 13%) OrHighMed 111.81 (3.0%) 112.01 (2.8%) 0.2% ( -5% - 6%) LowSpanNear 47.14 (2.4%) 47.24 (2.5%) 0.2% ( -4% - 5%) MedSloppyPhrase 48.25 (1.9%) 48.37 (2.3%) 0.2% ( -3% - 4%) LowSloppyPhrase 35.36 (2.2%) 35.46 (2.5%) 0.3% ( -4% - 5%) AndHighMed 144.05 (3.6%) 144.53 (2.7%) 0.3% ( -5% - 6%) HighSpanNear 6.92 (3.5%) 6.95 (3.5%) 0.5% ( -6% - 7%) MedPhrase 25.88 (2.4%) 26.00 (1.4%) 0.5% ( -3% - 4%) AndHighHigh 38.77 (4.0%) 38.98 (3.9%) 0.5% ( -7% - 8%) OrHighHigh 27.47 (3.2%) 27.63 (3.1%) 0.6% ( -5% - 7%) LowPhrase 91.71 (4.3%) 92.56 (3.5%) 0.9% ( -6% - 9%) HighSloppyPhrase 18.28 (3.2%) 18.45 (3.6%) 0.9% ( -5% - 8%) HighPhrase 20.07 (3.9%) 20.35 (1.3%) 1.4% ( -3% - 6%) BrowseDateTaxoFacets 2.37 (0.4%) 2.41 (0.2%) 1.4% ( 0% - 2%) {code} > Lazy loading Lucene FST offheap using mmap > ------------------------------------------ > > Key: LUCENE-8635 > URL: https://issues.apache.org/jira/browse/LUCENE-8635 > Project: Lucene - Core > Issue Type: New Feature > Components: core/FSTs > Environment: I used below setup for es_rally tests: > single node i3.xlarge running ES 6.5 > es_rally was running on another i3.xlarge instance > Reporter: Ankit Jain > Priority: Major > Attachments: fst-offheap-ra-rev.patch, offheap.patch, > optional_offheap_ra.patch, ra.patch, rally_benchmark.xlsx > > > Currently, FST loads all the terms into heap memory during index open. This > causes frequent JVM OOM issues if the term size gets big. A better way of > doing this will be to lazily load FST using mmap. That ensures only the > required terms get loaded into memory. > > Lucene can expose API for providing list of fields to load terms offheap. I'm > planning to take following approach for this: > # Add a boolean property fstOffHeap in FieldInfo > # Pass list of offheap fields to lucene during index open (ALL can be > special keyword for loading ALL fields offheap) > # Initialize the fstOffHeap property during lucene index open > # FieldReader invokes default FST constructor or OffHeap constructor based > on fstOffHeap field > > I created a patch (that loads all fields offheap), did some benchmarks using > es_rally and results look good. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org