Adrien Grand created LUCENE-7460:
------------------------------------

             Summary: Should SortedNumericDocValues expose a per-document 
random-access API?
                 Key: LUCENE-7460
                 URL: https://issues.apache.org/jira/browse/LUCENE-7460
             Project: Lucene - Core
          Issue Type: Wish
            Reporter: Adrien Grand
            Priority: Minor


Sorted numerics used to expose a per-document random-access API so that 
accessing the median or max element would be cheap. The new 
SortedNumericDocValues still exposes the number of values a document has, but 
the only way to read values is to use {nextValue}, which forces to read all 
values in order to read the max value.

For instance, {{SortedNumericSelector.MAX}} does the following in master (the 
important part is the for-loop):

{code}
    private void setValue() throws IOException {
      int count = in.docValueCount();
      for(int i=0;i<count;i++) {
        value = in.nextValue();
      }
    }

    @Override
    public int nextDoc() throws IOException {
      int docID = in.nextDoc();
      if (docID != NO_MORE_DOCS) {
        setValue();
      }
      return docID;
    }
{code}

while it used to simply look up the value at index {{count-1}} in 6.x:

{code}
    @Override
    public long get(int docID) {
      in.setDocument(docID);
      final int count = in.count();
      if (count == 0) {
        return 0; // missing
      } else {
        return in.valueAt(count-1);
      }
    }
{code}

This could be a conscious decision since a sequential API gives more 
opportunities to the codec to compress efficiently, but on the other hand this 
API prevents sorting by max or median values to be efficient.

On my end I have a preference for the random-access API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to