[
https://issues.apache.org/jira/browse/LUCENE-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866798#comment-16866798
]
Ignacio Vera commented on LUCENE-8867:
--------------------------------------
{quote}
This is only an issue in the case that not all dimensions are indexed, right?
Otherwise you could figure out that all values are equal in
IntersectVisitor#compare?
{quote}
I think this is generic issue. The problem here is not when are values are
equal but when you have a very low cardinality on the leaf nodes. In this case
the can safe lots of space by storing the values in the proposed way.
{quote}
One concern I have with the patch is that it assumes that the codec has doc IDs
available in an int[] slice as opposed to streaming them from disk directly to
the IntersectVisitor for instance.
{quote}
I see your concern , another option would be to change more radically the
interface and add a matches(byte[]) method and then use the visit(docID) method.
> Optimise BKD tree for low cardinality leaves
> --------------------------------------------
>
> Key: LUCENE-8867
> URL: https://issues.apache.org/jira/browse/LUCENE-8867
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Ignacio Vera
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Currently if a leaf on the BKD tree contains only few values, then the leaf
> is treated the same way as it all values are different. It many cases it can
> be much more efficient to store the distinct values with the cardinality.
> In addition, in this case the method IntersectVisitor#visit(docId, byte[]) is
> called n times with the same byte array but different docID. This issue
> proposes to add a new method to the interface that accepts an array of docs
> so it can be override by implementors and gain search performance.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]