[ 
https://issues.apache.org/jira/browse/LUCENE-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17399722#comment-17399722
 ] 

Michael McCandless commented on LUCENE-10014:
---------------------------------------------

OK patch looks good to me – thanks [~weizijun].

I think this change should mean a sizable disk savings for multi-valued numeric 
({{SortedNumericDocValues}}) doc values fields that have a high GCD, e.g. 
storing nanoseconds resolution times at millisecond (GCD = 1M) or second (GCD = 
1B) actual granularity.

The test changes also assert that the indexed and retrieved doc values are the 
same, in addition to checking that they match what was stored in stored fields 
(what the existing test did).  Too bad {{Document}} doesn't have an API to 
retrieve multiple numeric (not just String) stored fields...

And I agree with [~weizijun] that the existing {{blocksOfVariousBPV}} do indeed 
exercise GCD since the values all have a (randomly selected) multiplier each 
time, and since that test method also mixes up the bpv at block boundaries, it 
should be exercising the block (and GCD) compression.

I added a couple comments and a {{CHANGES}} entry and will push soon.   Thanks 
[~weizijun]!

> docvalue writeBlock gcd encode improve
> --------------------------------------
>
>                 Key: LUCENE-10014
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10014
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>            Reporter: weizijun
>            Priority: Major
>         Attachments: LUCENE-10014.patch
>
>
> Lucene90DocValuesConsumer.writeBlock calculate bitsPerValue  as:
> {code:java}
> final int bitsPerValue = DirectWriter.unsignedBitsRequired(max - min);
> {code}
>  it can use gcd in this place as:
> {code:java}
> (max - min) / gcd
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to