[ https://issues.apache.org/jira/browse/LUCENE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036139#comment-17036139 ]
Mark Harwood commented on LUCENE-9211: -------------------------------------- {quote}the link did not work. {quote} Sorry, formatting must have mangled my URL - this is the full link FWIW [https://github.com/apache/lucene-solr/blob/master/lucene/benchmark/conf/spatial.alg#L31] Thanks for testing and good to know your tests showed little difference in performance. What's your view on how best to proceed from here? Wait for Juan's PR to land before doing any more? > Adding compression to BinaryDocValues storage > --------------------------------------------- > > Key: LUCENE-9211 > URL: https://issues.apache.org/jira/browse/LUCENE-9211 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs > Reporter: Mark Harwood > Assignee: Mark Harwood > Priority: Minor > Labels: pull-request-available > > While SortedSetDocValues can be used today to store identical values in a > compact form this is not effective for data with many unique values. > The proposal is that BinaryDocValues should be stored in LZ4 compressed > blocks which can dramatically reduce disk storage costs in many cases. The > proposal is blocks of a number of documents are stored as a single compressed > blob along with metadata that records offsets where the original document > values can be found in the uncompressed content. > There's a trade-off here between efficient compression (more docs-per-block = > better compression) and fast retrieval times (fewer docs-per-block = faster > read access for single values). A fixed block size of 32 docs seems like it > would be a reasonable compromise for most scenarios. > A PR is up for review here [https://github.com/apache/lucene-solr/pull/1234] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org