On Tue, Jul 20, 2021 at 5:50 PM Michael McCandless < luc...@mikemccandless.com> wrote:
> To my knowledge, we don't have more deduplication logic. When an inner >> block has a single value, the IntersectVisitor likely >> returns CELL_INSIDE_QUERY and Lucene will only collect doc IDs for all leaf >> blocks that are under this leaf block without doing any additional >> comparison. This should be quite good already. >> > > I agree, this should be very effective. We check the value once, and then > go on to collect or skip those 1K (default maxPointsInLeafNode) docids. > It's actually 512 since LUCENE-9087 <https://issues.apache.org/jira/browse/LUCENE-9087>. :) (And an even more interesting feature of this change is that it changed n-dimensional BKD trees to always have 512 points per leaf, while they used to have between 512 and 1024 points per leaf before depending on the number of points of the segment.) -- Adrien