[ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113535#comment-17113535
 ] 

Michael Sokolov commented on LUCENE-9378:
-----------------------------------------

Here are the index file sizes (after merging to a single segment). In total 
there was a ~6.5% reduction in index size, although the doc values (dvd) file 
reduced quite a bit more, ~28%
h3.  Before
|4|_h4.dii|
|276168|_h4.dim|
|892220|_h4.fdt|
|840|_h4.fdx|
|4|_h4.fnm|
|1981564|_h4_Lucene80_0.dvd|
|24|_h4_Lucene80_0.dvm|
|5111752|_h4_Lucene84_0.doc|
|4108112|_h4_Lucene84_0.pos|
|1145544|_h4_Lucene84_0.tim|
|23268|_h4_Lucene84_0.tip|
|65104|_h4.nvd|
|4|_h4.nvm|
|4|_h4.si|
|4|segments_3|
|0|write.lock|
|13604636|TOTAL|
h3. After
|4|_h5.dii|
|276480|_h5.dim|
|12|_h5.fdm|
|889700|_h5.fdt|
|820|_h5.fdx|
|4|_h5.fnm|
|1421700|_h5_Lucene80_0.dvd|
|4|_h5_Lucene80_0.dvm|
|5111616|_h5_Lucene84_0.doc|
|4108024|_h5_Lucene84_0.pos|
|848876|_h5_Lucene84_0.tim|
|23244|_h5_Lucene84_0.tip|
|65104|_h5.nvd|
|4|_h5.nvm|
|4|_h5.si|
|4|segments_3|
|12745620|TOTAL|

> Configurable compression for BinaryDocValues
> --------------------------------------------
>
>                 Key: LUCENE-9378
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9378
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Viral Gandhi
>            Priority: Minor
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene84DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to