[jira] [Commented] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

Mike Sokolov (JIRA) Tue, 15 Jan 2019 05:42:16 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743067#comment-16743067
 ]


Mike Sokolov commented on LUCENE-8635:
--------------------------------------

This looked interesting to me, too, so I did run the becnhmarks with the 
change, but sadly the results were not great, which is surprising given the 
Rally test results, which looked positive I think? I'm not really sure how to 
interpret Rally output since I'm not familiar wit hthat tool. Does it test 
query performance? Maybe there is a use case for this that is different than 
what is being tested by the benchmarks; here is what I saw after a benchmark 
run. This run is maybe a little unusual since I have some mods to the benchmark 
(running w/8 threads executor service, enabled indexSort, topN=500 b/c of some 
other tests I was running. I can re-run with more "normal" settings, but this 
already looks kind of suspect.
{noformat}
                    Task  QPS before      StdDev   QPS after      StdDev        
        Pct diff
                PKLookup      163.94      (2.3%)      123.50      (2.0%)  
-24.7% ( -28% -  -20%)
              AndHighLow     5096.79      (1.2%)     4860.87      (1.5%)   
-4.6% (  -7% -   -2%)
                  Fuzzy1      711.37      (2.3%)      681.03      (2.4%)   
-4.3% (  -8% -    0%)
                  Fuzzy2      203.67      (2.6%)      196.77      (2.6%)   
-3.4% (  -8% -    1%)
              AndHighMed     3460.06      (2.7%)     3346.84      (3.2%)   
-3.3% (  -8% -    2%)
               LowPhrase     3448.68      (2.8%)     3345.41      (2.7%)   
-3.0% (  -8% -    2%)
         LowSloppyPhrase     3278.72      (2.9%)     3184.03      (2.8%)   
-2.9% (  -8% -    2%)
             LowSpanNear     3123.68      (2.9%)     3040.74      (2.6%)   
-2.7% (  -7% -    2%)
                 Respell      716.61      (1.7%)      699.22      (1.8%)   
-2.4% (  -5% -    1%)
               MedPhrase     2970.83      (3.2%)     2899.18      (3.0%)   
-2.4% (  -8% -    3%)
             AndHighHigh     2626.26      (3.7%)     2563.37      (4.0%)   
-2.4% (  -9% -    5%)
         MedSloppyPhrase     2642.66      (3.6%)     2582.02      (3.3%)   
-2.3% (  -8% -    4%)
             MedSpanNear     2598.01      (3.5%)     2541.03      (3.2%)   
-2.2% (  -8% -    4%)
    BrowseDateTaxoFacets     3467.39      (2.7%)     3399.62      (3.3%)   
-2.0% (  -7% -    4%)
                 LowTerm     3896.13      (4.7%)     3824.62      (4.4%)   
-1.8% ( -10% -    7%)
            HighSpanNear     1511.97      (4.7%)     1484.42      (4.6%)   
-1.8% ( -10% -    7%)
               OrHighMed     1406.84      (5.7%)     1382.52      (5.8%)   
-1.7% ( -12% -   10%)
               OrHighLow     1484.58      (6.1%)     1460.06      (6.0%)   
-1.7% ( -12% -   11%)
              HighPhrase     1740.06      (4.5%)     1712.12      (4.4%)   
-1.6% ( -10% -    7%)
        HighSloppyPhrase     1547.60      (4.7%)     1523.48      (4.6%)   
-1.6% ( -10% -    8%)
   BrowseMonthTaxoFacets     9031.31      (2.1%)     8897.26      (2.6%)   
-1.5% (  -6% -    3%)
              OrHighHigh     1111.59      (6.3%)     1095.29      (6.5%)   
-1.5% ( -13% -   12%)
   HighTermDayOfYearSort     2197.07      (5.9%)     2166.89      (3.9%)   
-1.4% ( -10% -    8%)
                 MedTerm     2621.21      (5.3%)     2586.41      (5.0%)   
-1.3% ( -11% -    9%)
BrowseDayOfYearTaxoFacets     9011.41      (1.6%)     8907.44      (1.5%)   
-1.2% (  -4% -    1%)
       HighTermMonthSort     2449.33      (5.5%)     2421.11      (4.4%)   
-1.2% ( -10% -    9%)
                HighTerm     1629.92      (6.5%)     1612.72      (6.4%)   
-1.1% ( -13% -   12%)
                  IntNRQ      980.43      (9.1%)      973.72      (8.9%)   
-0.7% ( -17% -   19%)
                Wildcard     1779.82      (5.7%)     1771.12      (5.5%)   
-0.5% ( -11% -   11%)
                 Prefix3     1790.47      (5.9%)     1781.85      (5.8%)   
-0.5% ( -11% -   11%)
BrowseDayOfYearSSDVFacets     2038.63      (3.0%)     2032.32      (2.1%)   
-0.3% (  -5% -    4%)
   BrowseMonthSSDVFacets     2295.02      (2.5%)     2303.01      (1.9%)    
0.3% (  -4% -    4%)
{noformat}

> Lazy loading Lucene FST offheap using mmap
> ------------------------------------------
>
>                 Key: LUCENE-8635
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8635
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/FSTs
>         Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>            Reporter: Ankit Jain
>            Priority: Major
>         Attachments: offheap.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This 
> causes frequent JVM OOM issues if the term size gets big. A better way of 
> doing this will be to lazily load FST using mmap. That ensures only the 
> required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm 
> planning to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be 
> special keyword for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based 
> on fstOffHeap field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using 
> es_rally and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

Reply via email to