[ https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743067#comment-16743067 ]
Mike Sokolov commented on LUCENE-8635: -------------------------------------- This looked interesting to me, too, so I did run the becnhmarks with the change, but sadly the results were not great, which is surprising given the Rally test results, which looked positive I think? I'm not really sure how to interpret Rally output since I'm not familiar wit hthat tool. Does it test query performance? Maybe there is a use case for this that is different than what is being tested by the benchmarks; here is what I saw after a benchmark run. This run is maybe a little unusual since I have some mods to the benchmark (running w/8 threads executor service, enabled indexSort, topN=500 b/c of some other tests I was running. I can re-run with more "normal" settings, but this already looks kind of suspect. {noformat} Task QPS before StdDev QPS after StdDev Pct diff PKLookup 163.94 (2.3%) 123.50 (2.0%) -24.7% ( -28% - -20%) AndHighLow 5096.79 (1.2%) 4860.87 (1.5%) -4.6% ( -7% - -2%) Fuzzy1 711.37 (2.3%) 681.03 (2.4%) -4.3% ( -8% - 0%) Fuzzy2 203.67 (2.6%) 196.77 (2.6%) -3.4% ( -8% - 1%) AndHighMed 3460.06 (2.7%) 3346.84 (3.2%) -3.3% ( -8% - 2%) LowPhrase 3448.68 (2.8%) 3345.41 (2.7%) -3.0% ( -8% - 2%) LowSloppyPhrase 3278.72 (2.9%) 3184.03 (2.8%) -2.9% ( -8% - 2%) LowSpanNear 3123.68 (2.9%) 3040.74 (2.6%) -2.7% ( -7% - 2%) Respell 716.61 (1.7%) 699.22 (1.8%) -2.4% ( -5% - 1%) MedPhrase 2970.83 (3.2%) 2899.18 (3.0%) -2.4% ( -8% - 3%) AndHighHigh 2626.26 (3.7%) 2563.37 (4.0%) -2.4% ( -9% - 5%) MedSloppyPhrase 2642.66 (3.6%) 2582.02 (3.3%) -2.3% ( -8% - 4%) MedSpanNear 2598.01 (3.5%) 2541.03 (3.2%) -2.2% ( -8% - 4%) BrowseDateTaxoFacets 3467.39 (2.7%) 3399.62 (3.3%) -2.0% ( -7% - 4%) LowTerm 3896.13 (4.7%) 3824.62 (4.4%) -1.8% ( -10% - 7%) HighSpanNear 1511.97 (4.7%) 1484.42 (4.6%) -1.8% ( -10% - 7%) OrHighMed 1406.84 (5.7%) 1382.52 (5.8%) -1.7% ( -12% - 10%) OrHighLow 1484.58 (6.1%) 1460.06 (6.0%) -1.7% ( -12% - 11%) HighPhrase 1740.06 (4.5%) 1712.12 (4.4%) -1.6% ( -10% - 7%) HighSloppyPhrase 1547.60 (4.7%) 1523.48 (4.6%) -1.6% ( -10% - 8%) BrowseMonthTaxoFacets 9031.31 (2.1%) 8897.26 (2.6%) -1.5% ( -6% - 3%) OrHighHigh 1111.59 (6.3%) 1095.29 (6.5%) -1.5% ( -13% - 12%) HighTermDayOfYearSort 2197.07 (5.9%) 2166.89 (3.9%) -1.4% ( -10% - 8%) MedTerm 2621.21 (5.3%) 2586.41 (5.0%) -1.3% ( -11% - 9%) BrowseDayOfYearTaxoFacets 9011.41 (1.6%) 8907.44 (1.5%) -1.2% ( -4% - 1%) HighTermMonthSort 2449.33 (5.5%) 2421.11 (4.4%) -1.2% ( -10% - 9%) HighTerm 1629.92 (6.5%) 1612.72 (6.4%) -1.1% ( -13% - 12%) IntNRQ 980.43 (9.1%) 973.72 (8.9%) -0.7% ( -17% - 19%) Wildcard 1779.82 (5.7%) 1771.12 (5.5%) -0.5% ( -11% - 11%) Prefix3 1790.47 (5.9%) 1781.85 (5.8%) -0.5% ( -11% - 11%) BrowseDayOfYearSSDVFacets 2038.63 (3.0%) 2032.32 (2.1%) -0.3% ( -5% - 4%) BrowseMonthSSDVFacets 2295.02 (2.5%) 2303.01 (1.9%) 0.3% ( -4% - 4%) {noformat} > Lazy loading Lucene FST offheap using mmap > ------------------------------------------ > > Key: LUCENE-8635 > URL: https://issues.apache.org/jira/browse/LUCENE-8635 > Project: Lucene - Core > Issue Type: New Feature > Components: core/FSTs > Environment: I used below setup for es_rally tests: > single node i3.xlarge running ES 6.5 > es_rally was running on another i3.xlarge instance > Reporter: Ankit Jain > Priority: Major > Attachments: offheap.patch, rally_benchmark.xlsx > > > Currently, FST loads all the terms into heap memory during index open. This > causes frequent JVM OOM issues if the term size gets big. A better way of > doing this will be to lazily load FST using mmap. That ensures only the > required terms get loaded into memory. > > Lucene can expose API for providing list of fields to load terms offheap. I'm > planning to take following approach for this: > # Add a boolean property fstOffHeap in FieldInfo > # Pass list of offheap fields to lucene during index open (ALL can be > special keyword for loading ALL fields offheap) > # Initialize the fstOffHeap property during lucene index open > # FieldReader invokes default FST constructor or OffHeap constructor based > on fstOffHeap field > > I created a patch (that loads all fields offheap), did some benchmarks using > es_rally and results look good. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org