[ https://issues.apache.org/jira/browse/LUCENE-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adrien Grand updated LUCENE-7351: --------------------------------- Attachment: LUCENE-7351.patch I have been experimenting with the attached patch, which compresses doc ids based on the number of required bytes to store them (it only specializes 8, 16, 24 and 32 bits per doc id) and also adds delta-compression for blocks whose values are all the same. The IndexAndSearchOpenStreetMaps reported a slow down of 1.7% for the box benchmark (72.3 QPS -> 71.1 QPS) but storage requirements decreased by 9.1% (635MB -> 577MB). The storage requirements improve even more with types that require fewer bytes (LatLonPoint requires 8 bytes per value). For instance indexing 10M random half floats with the patch requires 28MB vs 43MB on master (-35%). > BKDWriter should compress doc ids when all values in a block are the same > ------------------------------------------------------------------------- > > Key: LUCENE-7351 > URL: https://issues.apache.org/jira/browse/LUCENE-7351 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Attachments: LUCENE-7351.patch > > > BKDWriter writes doc ids using 4 bytes per document. I think it should > compress similarly to postings when all docs in a block have the same packed > value. This can happen either when a field has a default value which is > common across documents or when quantization makes the number of unique > values so small that a large index will necessarily have blocks that all > contain the same value (eg. there are only 63490 unique half-float values). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org