[ 
https://issues.apache.org/jira/browse/LUCENE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312706#comment-17312706
 ] 

Jim Ferenczi commented on LUCENE-9899:
--------------------------------------

Hard to say in the random case. I stumbled upon this because I am testing the 
compression of doc values when they are used in index sorting. I am indexing 
timestamps with a fixed gap of 10,000 milliseconds so I was curious why doc 
values needed 28 bits per doc in this optimized case (the index is sorted by 
the timestamp). With the patch it reduces to 16 bits per value. That's still 
big for a use case with a fixed delta but that's a different issue. 

> Numeric DV block compression ignores the gcd when computing the number of 
> bits required
> ---------------------------------------------------------------------------------------
>
>                 Key: LUCENE-9899
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9899
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Jim Ferenczi
>            Priority: Major
>         Attachments: LUCENE-9899.patch
>
>
> When numeric doc values are splitted per block we compute the number of bits 
> per value [from the minimum and maximum value present in the 
> block|https://github.com/apache/lucene/blob/d5d6dc079395c47cd6d12dcce3bcfdd2c7d9dc63/lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesConsumer.java#L390].
>  However, the greatest common divisor is not taken into account so the number 
> is overvalued for cases where it is greater than 1.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to