[jira] [Comment Edited] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

Ankit Jain (JIRA) Sun, 27 Jan 2019 14:23:55 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753609#comment-16753609
 ]


Ankit Jain edited comment on LUCENE-8635 at 1/27/19 10:14 PM:
--------------------------------------------------------------

Results for bigger data sets:

{code:title=wikimedium10m, java ...... -DFST.offheap=true|borderStyle=solid}
                    TaskQPS baseline      StdDevQPS candidate      StdDev       
         Pct diff
                PKLookup      117.59      (3.0%)      107.48      (2.3%)   
-8.6% ( -13% -   -3%)
            OrHighNotMed     1085.05      (2.1%)     1056.43      (2.2%)   
-2.6% (  -6% -    1%)
            OrNotHighLow      976.94      (2.4%)      955.32      (1.8%)   
-2.2% (  -6% -    2%)
            OrHighNotLow     1152.58      (2.6%)     1128.25      (2.0%)   
-2.1% (  -6% -    2%)
                  Fuzzy1       83.10      (2.6%)       81.54      (2.5%)   
-1.9% (  -6% -    3%)
                  IntNRQ       88.53     (16.2%)       86.92     (14.7%)   
-1.8% ( -28% -   34%)
           OrNotHighHigh      886.10      (1.7%)      870.26      (1.4%)   
-1.8% (  -4% -    1%)
           OrHighNotHigh      838.32      (1.8%)      824.15      (1.9%)   
-1.7% (  -5% -    2%)
   BrowseMonthTaxoFacets     8099.58      (2.0%)     7968.65      (1.8%)   
-1.6% (  -5% -    2%)
                  Fuzzy2       55.95      (2.7%)       55.08      (2.5%)   
-1.6% (  -6% -    3%)
            OrNotHighMed      764.40      (2.3%)      752.56      (1.7%)   
-1.5% (  -5% -    2%)
BrowseDayOfYearTaxoFacets     8081.37      (2.1%)     7957.27      (2.7%)   
-1.5% (  -6% -    3%)
                 LowTerm     1941.88      (5.2%)     1912.71      (4.0%)   
-1.5% ( -10% -    8%)
       HighTermMonthSort       78.12     (10.8%)       76.99     (14.3%)   
-1.4% ( -23% -   26%)
                 Respell       61.23      (2.7%)       60.57      (2.7%)   
-1.1% (  -6% -    4%)
                HighTerm     1526.16      (3.1%)     1510.23      (1.8%)   
-1.0% (  -5% -    4%)
                 MedTerm     1814.44      (3.7%)     1797.69      (2.1%)   
-0.9% (  -6% -    5%)
               OrHighLow      443.93      (2.4%)      439.92      (2.5%)   
-0.9% (  -5% -    4%)
              AndHighLow      577.60      (2.0%)      573.43      (1.4%)   
-0.7% (  -4% -    2%)
                Wildcard       62.79      (5.8%)       62.54      (6.1%)   
-0.4% ( -11% -   12%)
BrowseDayOfYearSSDVFacets       11.56      (8.0%)       11.55      (8.2%)   
-0.0% ( -15% -   17%)
                 Prefix3      165.76      (8.7%)      165.70      (9.2%)   
-0.0% ( -16% -   19%)
             MedSpanNear       51.40      (2.3%)       51.48      (2.5%)    
0.2% (  -4% -    5%)
   BrowseMonthSSDVFacets       14.45     (13.6%)       14.47     (13.2%)    
0.2% ( -23% -   31%)
   HighTermDayOfYearSort       44.98      (6.8%)       45.05      (5.3%)    
0.2% ( -11% -   13%)
               OrHighMed      111.81      (3.0%)      112.01      (2.8%)    
0.2% (  -5% -    6%)
             LowSpanNear       47.14      (2.4%)       47.24      (2.5%)    
0.2% (  -4% -    5%)
         MedSloppyPhrase       48.25      (1.9%)       48.37      (2.3%)    
0.2% (  -3% -    4%)
         LowSloppyPhrase       35.36      (2.2%)       35.46      (2.5%)    
0.3% (  -4% -    5%)
              AndHighMed      144.05      (3.6%)      144.53      (2.7%)    
0.3% (  -5% -    6%)
            HighSpanNear        6.92      (3.5%)        6.95      (3.5%)    
0.5% (  -6% -    7%)
               MedPhrase       25.88      (2.4%)       26.00      (1.4%)    
0.5% (  -3% -    4%)
             AndHighHigh       38.77      (4.0%)       38.98      (3.9%)    
0.5% (  -7% -    8%)
              OrHighHigh       27.47      (3.2%)       27.63      (3.1%)    
0.6% (  -5% -    7%)
               LowPhrase       91.71      (4.3%)       92.56      (3.5%)    
0.9% (  -6% -    9%)
        HighSloppyPhrase       18.28      (3.2%)       18.45      (3.6%)    
0.9% (  -5% -    8%)
              HighPhrase       20.07      (3.9%)       20.35      (1.3%)    
1.4% (  -3% -    6%)
    BrowseDateTaxoFacets        2.37      (0.4%)        2.41      (0.2%)    
1.4% (   0% -    2%)
{code}


was (Author: akjain):
Results for bigger data sets:

{code| title=wikimedium10m, java ...... -DFST.offheap=true|borderStyle=solid}
                    TaskQPS baseline      StdDevQPS candidate      StdDev       
         Pct diff
                PKLookup      117.59      (3.0%)      107.48      (2.3%)   
-8.6% ( -13% -   -3%)
            OrHighNotMed     1085.05      (2.1%)     1056.43      (2.2%)   
-2.6% (  -6% -    1%)
            OrNotHighLow      976.94      (2.4%)      955.32      (1.8%)   
-2.2% (  -6% -    2%)
            OrHighNotLow     1152.58      (2.6%)     1128.25      (2.0%)   
-2.1% (  -6% -    2%)
                  Fuzzy1       83.10      (2.6%)       81.54      (2.5%)   
-1.9% (  -6% -    3%)
                  IntNRQ       88.53     (16.2%)       86.92     (14.7%)   
-1.8% ( -28% -   34%)
           OrNotHighHigh      886.10      (1.7%)      870.26      (1.4%)   
-1.8% (  -4% -    1%)
           OrHighNotHigh      838.32      (1.8%)      824.15      (1.9%)   
-1.7% (  -5% -    2%)
   BrowseMonthTaxoFacets     8099.58      (2.0%)     7968.65      (1.8%)   
-1.6% (  -5% -    2%)
                  Fuzzy2       55.95      (2.7%)       55.08      (2.5%)   
-1.6% (  -6% -    3%)
            OrNotHighMed      764.40      (2.3%)      752.56      (1.7%)   
-1.5% (  -5% -    2%)
BrowseDayOfYearTaxoFacets     8081.37      (2.1%)     7957.27      (2.7%)   
-1.5% (  -6% -    3%)
                 LowTerm     1941.88      (5.2%)     1912.71      (4.0%)   
-1.5% ( -10% -    8%)
       HighTermMonthSort       78.12     (10.8%)       76.99     (14.3%)   
-1.4% ( -23% -   26%)
                 Respell       61.23      (2.7%)       60.57      (2.7%)   
-1.1% (  -6% -    4%)
                HighTerm     1526.16      (3.1%)     1510.23      (1.8%)   
-1.0% (  -5% -    4%)
                 MedTerm     1814.44      (3.7%)     1797.69      (2.1%)   
-0.9% (  -6% -    5%)
               OrHighLow      443.93      (2.4%)      439.92      (2.5%)   
-0.9% (  -5% -    4%)
              AndHighLow      577.60      (2.0%)      573.43      (1.4%)   
-0.7% (  -4% -    2%)
                Wildcard       62.79      (5.8%)       62.54      (6.1%)   
-0.4% ( -11% -   12%)
BrowseDayOfYearSSDVFacets       11.56      (8.0%)       11.55      (8.2%)   
-0.0% ( -15% -   17%)
                 Prefix3      165.76      (8.7%)      165.70      (9.2%)   
-0.0% ( -16% -   19%)
             MedSpanNear       51.40      (2.3%)       51.48      (2.5%)    
0.2% (  -4% -    5%)
   BrowseMonthSSDVFacets       14.45     (13.6%)       14.47     (13.2%)    
0.2% ( -23% -   31%)
   HighTermDayOfYearSort       44.98      (6.8%)       45.05      (5.3%)    
0.2% ( -11% -   13%)
               OrHighMed      111.81      (3.0%)      112.01      (2.8%)    
0.2% (  -5% -    6%)
             LowSpanNear       47.14      (2.4%)       47.24      (2.5%)    
0.2% (  -4% -    5%)
         MedSloppyPhrase       48.25      (1.9%)       48.37      (2.3%)    
0.2% (  -3% -    4%)
         LowSloppyPhrase       35.36      (2.2%)       35.46      (2.5%)    
0.3% (  -4% -    5%)
              AndHighMed      144.05      (3.6%)      144.53      (2.7%)    
0.3% (  -5% -    6%)
            HighSpanNear        6.92      (3.5%)        6.95      (3.5%)    
0.5% (  -6% -    7%)
               MedPhrase       25.88      (2.4%)       26.00      (1.4%)    
0.5% (  -3% -    4%)
             AndHighHigh       38.77      (4.0%)       38.98      (3.9%)    
0.5% (  -7% -    8%)
              OrHighHigh       27.47      (3.2%)       27.63      (3.1%)    
0.6% (  -5% -    7%)
               LowPhrase       91.71      (4.3%)       92.56      (3.5%)    
0.9% (  -6% -    9%)
        HighSloppyPhrase       18.28      (3.2%)       18.45      (3.6%)    
0.9% (  -5% -    8%)
              HighPhrase       20.07      (3.9%)       20.35      (1.3%)    
1.4% (  -3% -    6%)
    BrowseDateTaxoFacets        2.37      (0.4%)        2.41      (0.2%)    
1.4% (   0% -    2%)
{code}

> Lazy loading Lucene FST offheap using mmap
> ------------------------------------------
>
>                 Key: LUCENE-8635
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8635
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/FSTs
>         Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>            Reporter: Ankit Jain
>            Priority: Major
>         Attachments: fst-offheap-ra-rev.patch, offheap.patch, 
> optional_offheap_ra.patch, ra.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This 
> causes frequent JVM OOM issues if the term size gets big. A better way of 
> doing this will be to lazily load FST using mmap. That ensures only the 
> required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm 
> planning to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be 
> special keyword for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based 
> on fstOffHeap field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using 
> es_rally and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

Reply via email to