[jira] [Commented] (LUCENE-7460) Should SortedNumericDocValues expose a per-document random-access API?

Adrien Grand (JIRA) Sun, 25 Sep 2016 21:51:56 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-7460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522041#comment-15522041
 ]


Adrien Grand commented on LUCENE-7460:
--------------------------------------

Sorted numerics make it a bit hard to reason about to me since I am not very 
clear about the use-cases, but I guess that in some cases one would want to use 
the minimum value when sorting in ascending order and the max value when 
sorting in descending order, so having fast access to the maximum value too 
feels like an important feature. Of course users can index the min/max values 
directly but I think there is also some value in flexibility, eg. we do not 
require users to index edge n-grams to run prefix queries.

That said I do not feel too strongly about it and mostly wanted to give some 
visibility to this change of our doc values API and discuss it. If you feel 
strongly about keeping the iterator API, I'm good with it.

> Should SortedNumericDocValues expose a per-document random-access API?
> ----------------------------------------------------------------------
>
>                 Key: LUCENE-7460
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7460
>             Project: Lucene - Core
>          Issue Type: Wish
>            Reporter: Adrien Grand
>            Priority: Minor
>
> Sorted numerics used to expose a per-document random-access API so that 
> accessing the median or max element would be cheap. The new 
> SortedNumericDocValues still exposes the number of values a document has, but 
> the only way to read values is to use {nextValue}, which forces to read all 
> values in order to read the max value.
> For instance, {{SortedNumericSelector.MAX}} does the following in master (the 
> important part is the for-loop):
> {code}
>     private void setValue() throws IOException {
>       int count = in.docValueCount();
>       for(int i=0;i<count;i++) {
>         value = in.nextValue();
>       }
>     }
>     @Override
>     public int nextDoc() throws IOException {
>       int docID = in.nextDoc();
>       if (docID != NO_MORE_DOCS) {
>         setValue();
>       }
>       return docID;
>     }
> {code}
> while it used to simply look up the value at index {{count-1}} in 6.x:
> {code}
>     @Override
>     public long get(int docID) {
>       in.setDocument(docID);
>       final int count = in.count();
>       if (count == 0) {
>         return 0; // missing
>       } else {
>         return in.valueAt(count-1);
>       }
>     }
> {code}
> This could be a conscious decision since a sequential API gives more 
> opportunities to the codec to compress efficiently, but on the other hand 
> this API prevents sorting by max or median values to be efficient.
> On my end I have a preference for the random-access API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7460) Should SortedNumericDocValues expose a per-document random-access API?

Reply via email to