Greg, I don't think this works the way that you are imagining. It doesn't go document-at-a-time actually, and I don't think it is a good fit for two-phases.
It also does not do this two-part confirmation. It reads rectangles from the index and determines if a rectangle fits in the range. For example, if i query on Virginia, USA it might iterate like this: phase 1 * descend deeper into the USA rectangle, as it intersects the query range * discard the Germany rectangle, as it is completely outside of query range. phase 2 * discard entire western USA rectangle, outside of query range * descend deeper into eastern USA rectangle ... phase N * accept some county-sized rectangles that fit entirely inside virginia, they are inside query range * descend deeper into some county-sized rectangles on the border with north carolina, west virginia, maryland, etc It might even have to test some individual points at the end of the day to sort out the stuff on the "border" with the other states. But the number of individual points tested should not be large, as the BKD organizes itself based on density. Try this video that Mike made, as I am short on coffee too: https://www.youtube.com/watch?v=8BCf9Kx-AxY On Tue, Jun 29, 2021 at 9:30 AM Greg Miller <[email protected]> wrote: > > Hi folks- > > I've been spending a little time getting familiar with the BKD-tree-based > range query support currently implemented in Lucene, and wonder if there's > ever been a discussion around supporting two-phase iteration in this space. > If I'm understanding the current implementation properly (specifically > looking at PointRangeQuery), it appears that all matches are determined > up-front by 1) identifying segments of the tree that contain candidate > matches (i.e., containing part of the query range), and then 2) confirming > whether-or-not the contained points actually fall within the range. I'm also > a little low on coffee this morning so it's entirely possible I'm > misunderstanding the current implementation (please correct me if so). > > With this approach, it seems like we could potentially be doing quite a bit > of wasted effort in some situations. I have no thoughts on how to actually > implement this yet, but I wonder if we could support two-phase iteration by > 1) returning all docs with points contained in candidate BKD-tree segments as > an approximation, and then 2) only checking the points against the query > range when confirming matches in the second phase? I think the idea would > extend to LatLonPointDistanceQuery as well (and maybe others?). > > I did a Jira search for a related issue but came up empty. Anyone know if > this idea has been discussed previously, or if there's some inherent flaw > with the approach that would make it a non-starter? I don't really have any > cycles to work on this at the moment, but can at least open a Jira issue to > track if it seems like a reasonable thing to explore. > > Cheers, > -Greg --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
