[ 
https://issues.apache.org/jira/browse/LUCENE-4936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-4936:
---------------------------------

    Attachment: LUCENE-4936.patch

Patch:

 * Adds MathUtil.gcd(long, long)

 * Adds "GCD compression" to Lucene42, Disk and CheapBastard.

 * Improves BaseDocValuesFormatTest which almost only tested "TABLE_COMPRESSED" 
with Lucene42DVF

 * No more attempts to compress storage when the values are known to be dense, 
such as SORTED ords.

I measured how slower doc values indexing is with these new checks, and it is 
completely unnoticeable with random or dense values since the GCD quickly 
reaches 1. When the GCD is larger, it only made indexing 2% slower (every doc 
has a single field which is a NumericDocValuesField). So I think it's fine.
                
> docvalues date compression
> --------------------------
>
>                 Key: LUCENE-4936
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4936
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Robert Muir
>            Assignee: Adrien Grand
>         Attachments: LUCENE-4936.patch, LUCENE-4936.patch
>
>
> DocValues fields can be very wasteful if you are storing dates (like solr's 
> TrieDateField does if you enable docvalues) and don't actually need all the 
> precision: e.g. "date-only" fields like date of birth with no time component, 
> time fields without milliseconds precision, and so on.
> Ideally we'd compute GCD of all the values to save space 
> (numberOfTrailingZeros is not really enough here), but i think we should at 
> least look for values like 86400000, 3600000, and 1000 to be practical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to