Greg, you may find it interesting to see some code in spatial-extras that works similar to what you describe. There is a so-called CompositeSpatialStrategy built off a grid using the terms index (not BKD) plus another that stores the vector geometry into DocValues. The Query that does the work is here: https://github.com/apache/lucene/blob/main/lucene/spatial-extras/src/java/org/apache/lucene/spatial/composite/IntersectsRPTVerifyQuery.java Only some "shapes" that are at the border of the query shape will require the potentially expensive DocValues lookup.
~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Tue, Jun 29, 2021 at 9:30 AM Greg Miller <[email protected]> wrote: > Hi folks- > > I've been spending a little time getting familiar with the BKD-tree-based > range query support currently implemented in Lucene, and wonder if there's > ever been a discussion around supporting two-phase iteration in this space. > If I'm understanding the current implementation properly (specifically > looking at PointRangeQuery), it appears that all matches are determined > up-front by 1) identifying segments of the tree that contain candidate > matches (i.e., containing part of the query range), and then 2) confirming > whether-or-not the contained points actually fall within the range. I'm > also a little low on coffee this morning so it's entirely possible I'm > misunderstanding the current implementation (please correct me if so). > > With this approach, it seems like we could potentially be doing quite a > bit of wasted effort in some situations. I have no thoughts on how to > actually implement this yet, but I wonder if we could support two-phase > iteration by 1) returning all docs with points contained in candidate > BKD-tree segments as an approximation, and then 2) only checking the points > against the query range when confirming matches in the second phase? I > think the idea would extend to LatLonPointDistanceQuery as well (and maybe > others?). > > I did a Jira search for a related issue but came up empty. Anyone know if > this idea has been discussed previously, or if there's some inherent flaw > with the approach that would make it a non-starter? I don't really have any > cycles to work on this at the moment, but can at least open a Jira issue to > track if it seems like a reasonable thing to explore. > > Cheers, > -Greg >
