[GitHub] [lucene] romseygeek commented on pull request #12357: Better paging when random reads go backwards

via GitHub Thu, 08 Jun 2023 04:25:09 -0700


romseygeek commented on PR #12357:
URL: https://github.com/apache/lucene/pull/12357#issuecomment-1582409156


   I've only implemented this on `readByte()` so far, as that seems to be the 
method that is effected most.  Random reads of short, int and long values are 
mostly done when binary searching which is a much less adversarial case than 
step-by-step backwards reading.
   
   I ran a wikimedium10k benchmark using `NIOFSDirectory` for both baseline and 
competitor, and got the following results:
   ```
     TaskQPS baseline      StdDevQPS my_modified_version      StdDev            
    Pct diff p-value
          BrowseDayOfYearTaxoFacets     2514.67      (5.2%)     2202.60      
(7.0%)  -12.4% ( -23% -    0%) 0.000
          BrowseDayOfYearSSDVFacets     4397.09     (10.1%)     3875.61     
(13.3%)  -11.9% ( -31% -   12%) 0.001
               BrowseDateTaxoFacets     2974.26      (6.5%)     2641.17      
(6.0%)  -11.2% ( -22% -    1%) 0.000
        BrowseRandomLabelSSDVFacets     1189.01      (4.1%)     1067.62      
(6.2%)  -10.2% ( -19% -    0%) 0.000
        BrowseRandomLabelTaxoFacets     1601.99      (5.7%)     1438.85      
(6.2%)  -10.2% ( -20% -    1%) 0.000
              BrowseMonthTaxoFacets     2524.45      (8.1%)     2273.63      
(7.1%)   -9.9% ( -23% -    5%) 0.000
               BrowseDateSSDVFacets     1754.54     (10.7%)     1616.39      
(8.4%)   -7.9% ( -24% -   12%) 0.009
                             IntNRQ     1604.64      (7.7%)     1485.48      
(8.9%)   -7.4% ( -22% -    9%) 0.005
              BrowseMonthSSDVFacets     4473.87     (11.2%)     4179.28     
(10.4%)   -6.6% ( -25% -   16%) 0.053
                            Prefix3      568.51      (3.9%)      982.22     
(14.0%)   72.8% (  52% -   94%) 0.000
                        MedSpanNear      262.90      (2.4%)      490.78     
(10.3%)   86.7% (  72% -  101%) 0.000
                           Wildcard      496.50      (3.3%)     1031.86     
(17.1%)  107.8% (  84% -  132%) 0.000
                  HighTermMonthSort      583.14      (4.4%)     1263.09     
(19.9%)  116.6% (  88% -  147%) 0.000
                MedIntervalsOrdered      353.37      (2.3%)      767.89     
(11.9%)  117.3% ( 100% -  134%) 0.000
                        LowSpanNear      254.88      (3.1%)      624.77     
(16.6%)  145.1% ( 121% -  170%) 0.000
                           PKLookup       10.23      (1.7%)       25.13      
(8.6%)  145.6% ( 133% -  158%) 0.000
                         OrHighHigh      307.39      (3.1%)      762.93     
(18.4%)  148.2% ( 122% -  175%) 0.000
                          OrHighMed      384.51      (3.3%)      958.60     
(19.5%)  149.3% ( 122% -  178%) 0.000
                         HighPhrase      304.14      (2.9%)      765.04     
(14.6%)  151.5% ( 130% -  174%) 0.000
                       HighSpanNear      317.72      (3.1%)      804.30     
(18.0%)  153.1% ( 127% -  179%) 0.000
                        AndHighHigh      398.97      (3.4%)     1012.15     
(21.7%)  153.7% ( 124% -  185%) 0.000
              HighTermDayOfYearSort      582.77      (2.7%)     1495.18     
(23.5%)  156.6% ( 126% -  187%) 0.000
                           HighTerm      798.23      (2.8%)     2094.10     
(22.5%)  162.3% ( 133% -  193%) 0.000
                    MedSloppyPhrase      414.51      (2.7%)     1089.03     
(28.6%)  162.7% ( 127% -  199%) 0.000
                            Respell       68.25      (2.2%)      187.81     
(14.7%)  175.2% ( 154% -  196%) 0.000
                             Fuzzy2       14.66      (1.7%)       41.38     
(13.1%)  182.3% ( 164% -  200%) 0.000
                   HighSloppyPhrase      331.71      (2.1%)      946.59     
(22.2%)  185.4% ( 157% -  214%) 0.000
                LowIntervalsOrdered      823.97      (2.7%)     2387.93     
(19.5%)  189.8% ( 163% -  217%) 0.000
                            MedTerm      770.00      (4.3%)     2234.31     
(30.4%)  190.2% ( 149% -  234%) 0.000
                          LowPhrase      358.61      (3.0%)     1080.46     
(26.5%)  201.3% ( 166% -  238%) 0.000
                          OrHighLow      365.95      (2.8%)     1106.13     
(23.3%)  202.3% ( 171% -  235%) 0.000
                         AndHighMed      352.26      (2.5%)     1135.10     
(26.1%)  222.2% ( 189% -  257%) 0.000
               HighIntervalsOrdered      295.50      (2.7%)      971.90     
(32.3%)  228.9% ( 188% -  271%) 0.000
                          MedPhrase      317.77      (2.7%)     1083.18     
(29.9%)  240.9% ( 202% -  281%) 0.000
                            LowTerm      932.40      (2.8%)     3290.65     
(24.1%)  252.9% ( 219% -  287%) 0.000
                             Fuzzy1       37.84      (1.6%)      137.03     
(17.0%)  262.1% ( 239% -  285%) 0.000
                    LowSloppyPhrase      313.05      (3.1%)     1253.27     
(30.8%)  300.3% ( 258% -  344%) 0.000
                         AndHighLow      404.61      (2.6%)     1947.42     
(38.5%)  381.3% ( 331% -  433%) 0.000
   ```
   
   I'm not sure what's happening to slow down the facets implementation, so I 
will dig further into that, but it's a clear win for terms-based queries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [lucene] romseygeek commented on pull request #12357: Better paging when random reads go backwards

Reply via email to