Adrien Grand created LUCENE-7460:
------------------------------------
Summary: Should SortedNumericDocValues expose a per-document
random-access API?
Key: LUCENE-7460
URL: https://issues.apache.org/jira/browse/LUCENE-7460
Project: Lucene - Core
Issue Type: Wish
Reporter: Adrien Grand
Priority: Minor
Sorted numerics used to expose a per-document random-access API so that
accessing the median or max element would be cheap. The new
SortedNumericDocValues still exposes the number of values a document has, but
the only way to read values is to use {nextValue}, which forces to read all
values in order to read the max value.
For instance, {{SortedNumericSelector.MAX}} does the following in master (the
important part is the for-loop):
{code}
private void setValue() throws IOException {
int count = in.docValueCount();
for(int i=0;i<count;i++) {
value = in.nextValue();
}
}
@Override
public int nextDoc() throws IOException {
int docID = in.nextDoc();
if (docID != NO_MORE_DOCS) {
setValue();
}
return docID;
}
{code}
while it used to simply look up the value at index {{count-1}} in 6.x:
{code}
@Override
public long get(int docID) {
in.setDocument(docID);
final int count = in.count();
if (count == 0) {
return 0; // missing
} else {
return in.valueAt(count-1);
}
}
{code}
This could be a conscious decision since a sequential API gives more
opportunities to the codec to compress efficiently, but on the other hand this
API prevents sorting by max or median values to be efficient.
On my end I have a preference for the random-access API.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]