[
https://issues.apache.org/jira/browse/LUCENE-8885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875936#comment-16875936
]
ASF subversion and git services commented on LUCENE-8885:
---------------------------------------------------------
Commit db68634c67731d908d69d44232730ed92424e828 in lucene-solr's branch
refs/heads/master from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=db68634 ]
LUCENE-8885: Optimise BKD reader by exploiting cardinality information stored
on leaves (#746)
The commit adds the method InstersectVisitor#visit(DocIdSetIterator, byte[]).
> Optimise BKD reader by exploiting cardinality information stored on leaves
> --------------------------------------------------------------------------
>
> Key: LUCENE-8885
> URL: https://issues.apache.org/jira/browse/LUCENE-8885
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Ignacio Vera
> Priority: Major
> Time Spent: 1h 40m
> Remaining Estimate: 0h
>
> In LUCENE-8688 it was introduce a new storing strategy for leaves contains
> duplicated points. In such case the points are stored together with the
> cardinality. We still call the IntersectVisitor once per document therefore
> we are checking many times the same point agains the query. The idea is to
> check the point once and then add all the documents.
> The API of the IntersectVisitor does not allow that, and therefore to exploit
> that property we need to either change the API or extend it. Here are the
> possibilities I can think of:
> 1) Modify the API by replacing the method IntersectVisitor#visit(byte[], int)
> by the following method:
> {code:java}
> /** Called for leaf cells that intersects the leaf to test if the point
> matches to the query
> * In case it matches, the implementor must call {@link
> IntersectVisitor#visit(int)} with the
> * documents associated with this point are visited */
> boolean matches(byte[] packedValue) throws IOException;
> {code}
> This will allow the BKD reader to check if a point matches the query and if
> true then Coll the method IntersectVisitor#visit(int) for all documents
> associated with that point.
> The drawback of this approach is backwards compatibility and the need to
> update all classes implement this interface.
> 2) Extends the API by adding a new default method in the IntersectVisitor
> interface:
> {code:java}
> /** Called for documents in a leaf cell that crosses the query. The consumer
> * should scrutinize the packedValue to decide whether to accept it. If
> accepted it should
> * consider only the {@code numberDocs} documents starting at {@code
> offset} In the 1D case,
> * values are visited in increasing order, and in the case of ties, in
> increasing
> * docID order. */
> default void visit(int[] docID, int offset, int numberDocs, byte[]
> packedValue) throws IOException {
> for ( int i =offset; i < offset + numberDocs; i++) {
> visit(docID[i], packedValue);
> }
> }
> {code}
> The merit of this approach is that is backwards compatible and it is up to
> the implementors to override this method and get the benefits for this
> optimisation.The biggest downside is that it assumes that the codec has doc
> IDs available in an int[] slice as opposed to streaming them from disk
> directly to the IntersectVisitor for instance as [~jpountz] noted.
> Maybe there are more options I did not think about so looking forward to
> hearing opining if we should do this change at all and if so, how to approach
> it. My +1 goes to 1).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]