[ https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113489#comment-17113489 ]
Michael Sokolov commented on LUCENE-9378: ----------------------------------------- I updated luceneutil to enable sorting by a BinaryDocValues field over title and ran a test t across a wide range of tasks, comparing branch_8_4 (before) and branch_8_5 (after). In this test, all tasks have a title sort criterion applied. Interestingly, BrowseDateTaxoFacets shows a big improvement! But otherwise we see a pretty significant degradation in performance. ||Task||QPS before||StdDev||QPS after||StdDev||Pct diff|| |MedTerm|9.40|(3.1%)|1.68|(0.4%)|-82.1% ( -83% - -81%)| |LowTerm|20.17|(1.8%)|3.74|(0.4%)|-81.5% ( -82% - -80%)| |Wildcard|5.25|(3.3%)|1.02|(0.4%)|-80.6% ( -81% - -79%)| |Prefix3|12.83|(2.3%)|2.52|(0.4%)|-80.4% ( -81% - -79%)| |OrHighLow|3.07|(4.1%)|0.71|(0.6%)|-76.9% ( -78% - -75%)| |HighTerm|2.79|(4.6%)|0.72|(0.5%)|-74.1% ( -75% - -72%)| |Fuzzy2|19.88|(2.7%)|5.16|(0.5%)|-74.0% ( -75% - -72%)| |IntNRQ|329.04|(1.4%)|85.42|(0.4%)|-74.0% ( -74% - -73%)| |AndHighHigh|5.44|(3.1%)|1.52|(0.6%)|-72.1% ( -73% - -70%)| |AndHighMed|7.85|(2.4%)|2.55|(0.6%)|-67.4% ( -68% - -65%)| |LowSloppyPhrase|5.11|(2.4%)|1.90|(0.6%)|-62.9% ( -64% - -61%)| |OrHighHigh|1.47|(4.2%)|0.56|(1.0%)|-61.7% ( -64% - -58%)| |LowPhrase|8.21|(1.9%)|3.23|(0.6%)|-60.6% ( -61% - -59%)| |HighSloppyPhrase|1.48|(3.2%)|0.61|(0.9%)|-58.9% ( -61% - -56%)| |Fuzzy1|112.25|(5.7%)|46.46|(1.1%)|-58.6% ( -61% - -54%)| |MedSloppyPhrase|2.16|(3.0%)|0.94|(0.7%)|-56.5% ( -58% - -54%)| |OrHighMed|1.23|(4.4%)|0.54|(1.2%)|-55.9% ( -58% - -52%)| |MedPhrase|2.87|(2.6%)|1.77|(1.0%)|-38.5% ( -40% - -35%)| |HighPhrase|0.28|(3.3%)|0.21|(1.9%)|-24.1% ( -28% - -19%)| |HighIntervalsOrdered|0.48|(4.7%)|0.41|(2.9%)|-16.2% ( -22% - -9%)| |Respell|99.24|(1.7%)|86.51|(0.8%)|-12.8% ( -15% - -10%)| |AndHighLow|302.35|(2.5%)|276.95|(2.6%)|-8.4% ( -13% - -3%)| |BrowseDayOfYearTaxoFacets|4202.04|(3.0%)|4057.48|(2.6%)|-3.4% ( -8% - 2%)| |BrowseMonthTaxoFacets|4160.07|(2.8%)|4080.02|(2.2%)|-1.9% ( -6% - 3%)| |BrowseDayOfYearSSDVFacets|3.29|(4.9%)|3.29|(7.1%)|0.0% ( -11% - 12%)| |BrowseMonthSSDVFacets|3.68|(15.7%)|3.69|(16.9%)|0.3% ( -27% - 39%)| |BrowseDateTaxoFacets|0.54|(6.3%)|0.96|(5.8%)|77.3% ( 61% - 95%)| > Configurable compression for BinaryDocValues > -------------------------------------------- > > Key: LUCENE-9378 > URL: https://issues.apache.org/jira/browse/LUCENE-9378 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Viral Gandhi > Priority: Minor > > Lucene 8.5.1 includes a change to always [compress > BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This > caused (~30%) reduction in our red-line QPS (throughput). > We think users should be given some way to opt-in for this compression > feature instead of always being enabled which can have a substantial query > time cost as we saw during our upgrade. [~mikemccand] suggested one possible > approach by introducing a *mode* in Lucene84DocValuesFormat (COMPRESSED and > UNCOMPRESSED) and allowing users to create a custom Codec subclassing the > default Codec and pick the format they want. > Idea is similar to Lucene50StoredFieldsFormat which has two modes, > Mode.BEST_SPEED and Mode.BEST_COMPRESSION. > Here's related issues for adding benchmark covering BINARY doc values > query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org