[ https://issues.apache.org/jira/browse/LUCENE-9450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17179235#comment-17179235 ]
Gautam Worah edited comment on LUCENE-9450 at 8/17/20, 9:29 PM: ---------------------------------------------------------------- Benchmarks output: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff Respell 389.13 (11.9%) 341.06 (17.2%) -12.4% ( -37% - 18%) HighSloppyPhrase 354.72 (5.8%) 344.69 (4.6%) -2.8% ( -12% - 8%) MedSloppyPhrase 754.17 (3.7%) 734.47 (9.1%) -2.6% ( -14% - 10%) LowPhrase 422.80 (4.3%) 413.10 (7.4%) -2.3% ( -13% - 9%) OrHighMed 666.99 (4.2%) 654.89 (6.0%) -1.8% ( -11% - 8%) Prefix3 501.95 (7.1%) 494.34 (6.9%) -1.5% ( -14% - 13%) MedSpanNear 924.07 (4.4%) 914.55 (5.6%) -1.0% ( -10% - 9%) HighIntervalsOrdered 913.74 (2.0%) 905.72 (2.2%) -0.9% ( -4% - 3%) MedTerm 3841.01 (3.0%) 3811.27 (3.2%) -0.8% ( -6% - 5%) MedPhrase 912.31 (2.3%) 906.87 (2.7%) -0.6% ( -5% - 4%) LowSpanNear 769.51 (13.1%) 766.22 (12.4%) -0.4% ( -22% - 28%) BrowseDayOfYearSSDVFacets 2028.19 (2.8%) 2022.17 (2.2%) -0.3% ( -5% - 4%) IntNRQ 1446.93 (2.4%) 1442.66 (1.6%) -0.3% ( -4% - 3%) HighTermMonthSort 2631.06 (2.7%) 2628.64 (4.9%) -0.1% ( -7% - 7%) AndHighLow 2713.80 (3.0%) 2711.42 (3.4%) -0.1% ( -6% - 6%) HighTerm 2785.67 (2.7%) 2783.94 (3.6%) -0.1% ( -6% - 6%) HighSpanNear 595.30 (11.4%) 595.48 (10.0%) 0.0% ( -19% - 24%) BrowseMonthSSDVFacets 2367.18 (2.2%) 2369.20 (2.1%) 0.1% ( -4% - 4%) AndHighHigh 885.75 (3.4%) 887.16 (3.9%) 0.2% ( -6% - 7%) Wildcard 730.34 (11.7%) 732.30 (12.2%) 0.3% ( -21% - 27%) HighPhrase 655.83 (3.5%) 658.13 (2.4%) 0.4% ( -5% - 6%) HighTermDayOfYearSort 1724.90 (6.1%) 1731.23 (5.8%) 0.4% ( -10% - 13%) LowTerm 4271.57 (2.7%) 4290.84 (3.9%) 0.5% ( -5% - 7%) PKLookup 243.00 (2.9%) 244.62 (1.3%) 0.7% ( -3% - 5%) LowSloppyPhrase 1702.96 (2.8%) 1718.94 (3.4%) 0.9% ( -5% - 7%) Fuzzy1 398.68 (7.4%) 403.04 (4.8%) 1.1% ( -10% - 14%) OrHighHigh 411.49 (9.1%) 417.82 (5.8%) 1.5% ( -12% - 18%) OrHighLow 707.11 (4.2%) 718.78 (6.0%) 1.7% ( -8% - 12%) AndHighMed 1110.43 (4.5%) 1134.33 (3.4%) 2.2% ( -5% - 10%) Fuzzy2 46.38 (22.4%) 48.94 (19.1%) 5.5% ( -29% - 60%) BrowseDateTaxoFacets 2996.90 (4.3%) 3212.31 (5.2%) 7.2% ( -2% - 17%) BrowseDayOfYearTaxoFacets 2594.45 (2.8%) 2785.41 (3.7%) 7.4% ( 0% - 14%) BrowseMonthTaxoFacets 2920.16 (3.7%) 3139.41 (3.9%) 7.5% ( 0% - 15%) {code} My [localrun.py|https://pastebin.com/YAXZQp4z] script was (Author: gworah): Benchmarks output: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff Respell 389.13 (11.9%) 341.06 (17.2%) -12.4% ( -37% - 18%) HighSloppyPhrase 354.72 (5.8%) 344.69 (4.6%) -2.8% ( -12% - 8%) MedSloppyPhrase 754.17 (3.7%) 734.47 (9.1%) -2.6% ( -14% - 10%) LowPhrase 422.80 (4.3%) 413.10 (7.4%) -2.3% ( -13% - 9%) OrHighMed 666.99 (4.2%) 654.89 (6.0%) -1.8% ( -11% - 8%) Prefix3 501.95 (7.1%) 494.34 (6.9%) -1.5% ( -14% - 13%) MedSpanNear 924.07 (4.4%) 914.55 (5.6%) -1.0% ( -10% - 9%) HighIntervalsOrdered 913.74 (2.0%) 905.72 (2.2%) -0.9% ( -4% - 3%) MedTerm 3841.01 (3.0%) 3811.27 (3.2%) -0.8% ( -6% - 5%) MedPhrase 912.31 (2.3%) 906.87 (2.7%) -0.6% ( -5% - 4%) LowSpanNear 769.51 (13.1%) 766.22 (12.4%) -0.4% ( -22% - 28%) BrowseDayOfYearSSDVFacets 2028.19 (2.8%) 2022.17 (2.2%) -0.3% ( -5% - 4%) IntNRQ 1446.93 (2.4%) 1442.66 (1.6%) -0.3% ( -4% - 3%) HighTermMonthSort 2631.06 (2.7%) 2628.64 (4.9%) -0.1% ( -7% - 7%) AndHighLow 2713.80 (3.0%) 2711.42 (3.4%) -0.1% ( -6% - 6%) HighTerm 2785.67 (2.7%) 2783.94 (3.6%) -0.1% ( -6% - 6%) HighSpanNear 595.30 (11.4%) 595.48 (10.0%) 0.0% ( -19% - 24%) BrowseMonthSSDVFacets 2367.18 (2.2%) 2369.20 (2.1%) 0.1% ( -4% - 4%) AndHighHigh 885.75 (3.4%) 887.16 (3.9%) 0.2% ( -6% - 7%) Wildcard 730.34 (11.7%) 732.30 (12.2%) 0.3% ( -21% - 27%) HighPhrase 655.83 (3.5%) 658.13 (2.4%) 0.4% ( -5% - 6%) HighTermDayOfYearSort 1724.90 (6.1%) 1731.23 (5.8%) 0.4% ( -10% - 13%) LowTerm 4271.57 (2.7%) 4290.84 (3.9%) 0.5% ( -5% - 7%) PKLookup 243.00 (2.9%) 244.62 (1.3%) 0.7% ( -3% - 5%) LowSloppyPhrase 1702.96 (2.8%) 1718.94 (3.4%) 0.9% ( -5% - 7%) Fuzzy1 398.68 (7.4%) 403.04 (4.8%) 1.1% ( -10% - 14%) OrHighHigh 411.49 (9.1%) 417.82 (5.8%) 1.5% ( -12% - 18%) OrHighLow 707.11 (4.2%) 718.78 (6.0%) 1.7% ( -8% - 12%) AndHighMed 1110.43 (4.5%) 1134.33 (3.4%) 2.2% ( -5% - 10%) Fuzzy2 46.38 (22.4%) 48.94 (19.1%) 5.5% ( -29% - 60%) BrowseDateTaxoFacets 2996.90 (4.3%) 3212.31 (5.2%) 7.2% ( -2% - 17%) BrowseDayOfYearTaxoFacets 2594.45 (2.8%) 2785.41 (3.7%) 7.4% ( 0% - 14%) BrowseMonthTaxoFacets 2920.16 (3.7%) 3139.41 (3.9%) 7.5% ( 0% - 15%) {code} My [localrun.py|https://pastebin.com/YAXZQp4z] script > Taxonomy index should use DocValues not StoredFields > ---------------------------------------------------- > > Key: LUCENE-9450 > URL: https://issues.apache.org/jira/browse/LUCENE-9450 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet > Affects Versions: 8.5.2 > Reporter: Gautam Worah > Priority: Minor > Labels: performance > Attachments: wip_taxonomy_patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > The taxonomy index that maps binning labels to ordinals was created before > Lucene added BinaryDocValues. > I've attached a WIP patch (does not pass tests currently) > Issue suggested by [~mikemccand] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org