[ https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137820#comment-17137820 ]
Michael Gibney commented on LUCENE-9378: ---------------------------------------- [~alexklibisz], quick clarification regarding "for every doc in the lucene shard": do your benchmarks illustrating the regression evaluate the vector query over the full domain (i.e., literally every (live) doc in the index, without any pre-filtering of the search domain)? This question is related to [~jpountz]'s comment above: "decompress all values when we need a single one in a block". It would make sense that docId-order access to docValues over the full domain could be faster (e.g., full-domain facets?); selective docId-order access (i.e. over a filtered domain) could be slower; arbitrary (non-docId-order) access over the full domain would likely be a worst-case scenario (wrt the impact of block size), all else being equal. The last of these would affect bulk-export-type use cases, accessing docValues for each doc in arbitrary order. > Configurable compression for BinaryDocValues > -------------------------------------------- > > Key: LUCENE-9378 > URL: https://issues.apache.org/jira/browse/LUCENE-9378 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Viral Gandhi > Priority: Minor > Attachments: image-2020-06-12-22-17-30-339.png, > image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, > image-2020-06-12-22-18-48-919.png > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Lucene 8.5.1 includes a change to always [compress > BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This > caused (~30%) reduction in our red-line QPS (throughput). > We think users should be given some way to opt-in for this compression > feature instead of always being enabled which can have a substantial query > time cost as we saw during our upgrade. [~mikemccand] suggested one possible > approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and > UNCOMPRESSED) and allowing users to create a custom Codec subclassing the > default Codec and pick the format they want. > Idea is similar to Lucene50StoredFieldsFormat which has two modes, > Mode.BEST_SPEED and Mode.BEST_COMPRESSION. > Here's related issues for adding benchmark covering BINARY doc values > query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org