[ https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113535#comment-17113535 ]
Michael Sokolov commented on LUCENE-9378: ----------------------------------------- Here are the index file sizes (after merging to a single segment). In total there was a ~6.5% reduction in index size, although the doc values (dvd) file reduced quite a bit more, ~28% h3. Before |4|_h4.dii| |276168|_h4.dim| |892220|_h4.fdt| |840|_h4.fdx| |4|_h4.fnm| |1981564|_h4_Lucene80_0.dvd| |24|_h4_Lucene80_0.dvm| |5111752|_h4_Lucene84_0.doc| |4108112|_h4_Lucene84_0.pos| |1145544|_h4_Lucene84_0.tim| |23268|_h4_Lucene84_0.tip| |65104|_h4.nvd| |4|_h4.nvm| |4|_h4.si| |4|segments_3| |0|write.lock| |13604636|TOTAL| h3. After |4|_h5.dii| |276480|_h5.dim| |12|_h5.fdm| |889700|_h5.fdt| |820|_h5.fdx| |4|_h5.fnm| |1421700|_h5_Lucene80_0.dvd| |4|_h5_Lucene80_0.dvm| |5111616|_h5_Lucene84_0.doc| |4108024|_h5_Lucene84_0.pos| |848876|_h5_Lucene84_0.tim| |23244|_h5_Lucene84_0.tip| |65104|_h5.nvd| |4|_h5.nvm| |4|_h5.si| |4|segments_3| |12745620|TOTAL| > Configurable compression for BinaryDocValues > -------------------------------------------- > > Key: LUCENE-9378 > URL: https://issues.apache.org/jira/browse/LUCENE-9378 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Viral Gandhi > Priority: Minor > > Lucene 8.5.1 includes a change to always [compress > BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This > caused (~30%) reduction in our red-line QPS (throughput). > We think users should be given some way to opt-in for this compression > feature instead of always being enabled which can have a substantial query > time cost as we saw during our upgrade. [~mikemccand] suggested one possible > approach by introducing a *mode* in Lucene84DocValuesFormat (COMPRESSED and > UNCOMPRESSED) and allowing users to create a custom Codec subclassing the > default Codec and pick the format they want. > Idea is similar to Lucene50StoredFieldsFormat which has two modes, > Mode.BEST_SPEED and Mode.BEST_COMPRESSION. > Here's related issues for adding benchmark covering BINARY doc values > query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org