[ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137925#comment-17137925
 ] 

Alex Klibisz commented on LUCENE-9378:
--------------------------------------

[~mgibney] 

> quick clarification regarding "for every doc in the lucene shard": do your 
>benchmarks illustrating the regression evaluate the vector query over the full 
>domain (i.e., literally every (live) doc in the index, without any 
>pre-filtering of the search domain)?

It's reading every live doc in the elasticsearch index, which consists of 
multiple Lucene shards. I'm not controlling the order. From the perspective of 
my plugin, I'm just getting a docId from Elasticsearch and using 
`advanceNext(docId)` to lookup the binary value. Also, there are no deleted 
docs in this particular case, though there could be in practice.

Here's the exact snippet: 
[https://github.com/alexklibisz/elastiknn/blob/benchmarks-4/plugin/src/main/scala/com/klibisz/elastiknn/query/ExactQuery.scala#L25-L29]

> Configurable compression for BinaryDocValues
> --------------------------------------------
>
>                 Key: LUCENE-9378
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9378
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Viral Gandhi
>            Priority: Minor
>         Attachments: image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png
>
>          Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to