[ https://issues.apache.org/jira/browse/LUCENE-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17399722#comment-17399722 ]
Michael McCandless commented on LUCENE-10014: --------------------------------------------- OK patch looks good to me – thanks [~weizijun]. I think this change should mean a sizable disk savings for multi-valued numeric ({{SortedNumericDocValues}}) doc values fields that have a high GCD, e.g. storing nanoseconds resolution times at millisecond (GCD = 1M) or second (GCD = 1B) actual granularity. The test changes also assert that the indexed and retrieved doc values are the same, in addition to checking that they match what was stored in stored fields (what the existing test did). Too bad {{Document}} doesn't have an API to retrieve multiple numeric (not just String) stored fields... And I agree with [~weizijun] that the existing {{blocksOfVariousBPV}} do indeed exercise GCD since the values all have a (randomly selected) multiplier each time, and since that test method also mixes up the bpv at block boundaries, it should be exercising the block (and GCD) compression. I added a couple comments and a {{CHANGES}} entry and will push soon. Thanks [~weizijun]! > docvalue writeBlock gcd encode improve > -------------------------------------- > > Key: LUCENE-10014 > URL: https://issues.apache.org/jira/browse/LUCENE-10014 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs > Reporter: weizijun > Priority: Major > Attachments: LUCENE-10014.patch > > > Lucene90DocValuesConsumer.writeBlock calculate bitsPerValue as: > {code:java} > final int bitsPerValue = DirectWriter.unsignedBitsRequired(max - min); > {code} > it can use gcd in this place as: > {code:java} > (max - min) / gcd > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org