Hard to read on the phone, but is that a 482% speed up I saw??! On Thu, Sep 23, 2021, 1:28 PM Greg Miller (Jira) <[email protected]> wrote:
> > [ > https://issues.apache.org/jira/browse/LUCENE-10062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17419349#comment-17419349 > ] > > Greg Miller commented on LUCENE-10062: > -------------------------------------- > > I re-ran {{luceneutil}} benchmarks {{wikimedium10m}} since [~mikemccand] > added new faceting tasks (thanks Mike!). Looks like there's a nice > improvement on these new faceting tasks as well with this change (and no > regressions anywhere else that I see). > > I was waiting to iterate on my PR until I was able to run these new > benchmarking tasks, but it seems like there's enough benefit to this change > to pick it back up. > > > {noformat} > TaskQPS baseline StdDevQPS candidate > StdDev Pct diff p-value > HighTermDayOfYearSort 70.02 (13.7%) 68.45 > (9.7%) -2.2% ( -22% - 24%) 0.551 > MedTerm 1300.90 (5.5%) 1275.97 > (6.7%) -1.9% ( -13% - 10%) 0.324 > HighTerm 1953.46 (5.8%) 1925.79 > (7.9%) -1.4% ( -14% - 13%) 0.518 > HighTermTitleBDVSort 122.35 (15.6%) 120.86 > (14.9%) -1.2% ( -27% - 34%) 0.801 > TermDTSort 133.47 (8.7%) 131.86 > (7.4%) -1.2% ( -15% - 16%) 0.637 > LowTerm 1636.13 (5.5%) 1622.34 > (7.4%) -0.8% ( -12% - 12%) 0.682 > Prefix3 25.69 (6.0%) 25.48 > (6.3%) -0.8% ( -12% - 12%) 0.676 > LowSpanNear 118.02 (2.1%) 117.31 > (1.8%) -0.6% ( -4% - 3%) 0.326 > HighTermMonthSort 140.17 (9.8%) 139.47 > (9.9%) -0.5% ( -18% - 21%) 0.872 > AndHighHigh 49.17 (3.1%) 48.92 > (2.7%) -0.5% ( -6% - 5%) 0.584 > HighSpanNear 25.54 (2.7%) 25.41 > (2.2%) -0.5% ( -5% - 4%) 0.529 > AndHighLow 556.68 (5.8%) 554.80 > (5.4%) -0.3% ( -10% - 11%) 0.848 > BrowseDayOfYearSSDVFacets 16.53 (2.5%) 16.47 > (2.4%) -0.3% ( -5% - 4%) 0.674 > IntNRQ 87.76 (2.0%) 87.49 > (2.1%) -0.3% ( -4% - 3%) 0.634 > MedSpanNear 31.11 (2.2%) 31.04 > (1.6%) -0.2% ( -3% - 3%) 0.714 > OrNotHighLow 765.10 (4.5%) 763.60 > (5.4%) -0.2% ( -9% - 10%) 0.901 > MedPhrase 160.05 (3.1%) 159.83 > (2.9%) -0.1% ( -5% - 6%) 0.885 > HighSloppyPhrase 27.67 (3.1%) 27.64 > (3.0%) -0.1% ( -6% - 6%) 0.915 > LowPhrase 61.12 (3.2%) 61.05 > (3.2%) -0.1% ( -6% - 6%) 0.921 > OrHighMed 71.85 (2.9%) 71.82 > (2.1%) -0.0% ( -4% - 5%) 0.963 > HighPhrase 29.40 (2.3%) 29.39 > (2.8%) -0.0% ( -5% - 5%) 0.971 > Fuzzy2 32.58 (4.3%) 32.57 > (6.1%) -0.0% ( -9% - 10%) 0.992 > LowIntervalsOrdered 150.30 (1.9%) 150.28 > (1.9%) -0.0% ( -3% - 3%) 0.986 > AndHighMed 151.32 (3.9%) 151.31 > (4.1%) -0.0% ( -7% - 8%) 0.993 > OrHighHigh 23.90 (2.3%) 23.91 > (1.9%) 0.0% ( -4% - 4%) 0.970 > OrHighNotLow 579.17 (5.1%) 579.35 > (6.4%) 0.0% ( -10% - 12%) 0.986 > MedIntervalsOrdered 86.93 (1.7%) 86.98 > (1.9%) 0.1% ( -3% - 3%) 0.913 > OrHighNotHigh 536.17 (5.6%) 536.57 > (6.6%) 0.1% ( -11% - 12%) 0.969 > OrNotHighHigh 787.07 (6.5%) 787.96 > (8.1%) 0.1% ( -13% - 15%) 0.961 > OrNotHighMed 687.97 (4.7%) 688.77 > (6.9%) 0.1% ( -10% - 12%) 0.950 > MedSloppyPhrase 68.62 (2.8%) 68.74 > (2.7%) 0.2% ( -5% - 5%) 0.838 > LowSloppyPhrase 130.37 (2.6%) 130.62 > (2.2%) 0.2% ( -4% - 5%) 0.797 > OrHighLow 440.44 (4.1%) 441.33 > (4.1%) 0.2% ( -7% - 8%) 0.877 > Wildcard 122.01 (5.2%) 122.35 > (5.3%) 0.3% ( -9% - 11%) 0.867 > HighIntervalsOrdered 14.24 (2.2%) 14.34 > (2.1%) 0.6% ( -3% - 5%) 0.350 > Respell 52.04 (2.2%) 52.48 > (2.0%) 0.8% ( -3% - 5%) 0.209 > OrHighNotMed 674.76 (4.8%) 680.97 > (8.0%) 0.9% ( -11% - 14%) 0.659 > PKLookup 153.45 (4.3%) 155.13 > (3.8%) 1.1% ( -6% - 9%) 0.394 > Fuzzy1 56.57 (9.1%) 57.76 > (6.7%) 2.1% ( -12% - 19%) 0.406 > BrowseMonthSSDVFacets 19.59 (10.4%) 20.03 > (6.7%) 2.3% ( -13% - 21%) 0.413 > AndHighHighDayTaxoFacets 19.22 (1.6%) 22.13 > (2.2%) 15.1% ( 11% - 19%) 0.000 > AndHighMedDayTaxoFacets 25.62 (1.5%) 29.93 > (2.2%) 16.8% ( 12% - 20%) 0.000 > MedTermDayTaxoFacets 12.96 (2.2%) 18.99 > (3.4%) 46.5% ( 39% - 53%) 0.000 > OrHighMedDayTaxoFacets 3.97 (2.0%) 5.81 > (4.3%) 46.5% ( 39% - 53%) 0.000 > BrowseMonthTaxoFacets 2.59 (10.9%) 11.16 > (35.8%) 330.4% ( 255% - 423%) 0.000 > BrowseDateTaxoFacets 2.44 (9.7%) 13.12 > (51.8%) 438.1% ( 343% - 553%) 0.000 > BrowseDayOfYearTaxoFacets 2.44 (9.7%) 13.13 > (51.7%) 438.2% ( 343% - 552%) 0.000 > {noformat} > > > > Explore using SORTED_NUMERIC doc values to encode taxonomy ordinals for > faceting > > > -------------------------------------------------------------------------------- > > > > Key: LUCENE-10062 > > URL: https://issues.apache.org/jira/browse/LUCENE-10062 > > Project: Lucene - Core > > Issue Type: Improvement > > Components: modules/facet > > Reporter: Greg Miller > > Assignee: Greg Miller > > Priority: Minor > > Time Spent: 1h 40m > > Remaining Estimate: 0h > > > > We currently encode taxonomy ordinals using varint style packing in a > binary doc values field. I suspect there have been a number of improvements > to SortedNumericDocValues since taxonomy faceting was first introduced, and > I plan to explore replacing the custom binary format we have today with a > SORTED_NUMERIC type dv field instead. > > I'll report benchmark results and index size impact here. > > > > -- > This message was sent by Atlassian Jira > (v8.3.4#803005) > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
