Greg, you may find it interesting to see some code in spatial-extras that
works similar to what you describe. There is a so-called
CompositeSpatialStrategy built off a grid using the terms index (not BKD)
plus another that stores the vector geometry into DocValues.  The Query
that does the work is here:
https://github.com/apache/lucene/blob/main/lucene/spatial-extras/src/java/org/apache/lucene/spatial/composite/IntersectsRPTVerifyQuery.java
Only some "shapes" that are at the border of the query shape will require
the potentially expensive DocValues lookup.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Jun 29, 2021 at 9:30 AM Greg Miller <[email protected]> wrote:

> Hi folks-
>
> I've been spending a little time getting familiar with the BKD-tree-based
> range query support currently implemented in Lucene, and wonder if there's
> ever been a discussion around supporting two-phase iteration in this space.
> If I'm understanding the current implementation properly (specifically
> looking at PointRangeQuery), it appears that all matches are determined
> up-front by 1) identifying segments of the tree that contain candidate
> matches (i.e., containing part of the query range), and then 2) confirming
> whether-or-not the contained points actually fall within the range. I'm
> also a little low on coffee this morning so it's entirely possible I'm
> misunderstanding the current implementation (please correct me if so).
>
> With this approach, it seems like we could potentially be doing quite a
> bit of wasted effort in some situations. I have no thoughts on how to
> actually implement this yet, but I wonder if we could support two-phase
> iteration by 1) returning all docs with points contained in candidate
> BKD-tree segments as an approximation, and then 2) only checking the points
> against the query range when confirming matches in the second phase? I
> think the idea would extend to LatLonPointDistanceQuery as well (and maybe
> others?).
>
> I did a Jira search for a related issue but came up empty. Anyone know if
> this idea has been discussed previously, or if there's some inherent flaw
> with the approach that would make it a non-starter? I don't really have any
> cycles to work on this at the moment, but can at least open a Jira issue to
> track if it seems like a reasonable thing to explore.
>
> Cheers,
> -Greg
>

Reply via email to