[ https://issues.apache.org/jira/browse/LUCENE-10204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516171#comment-17516171 ]
Marc D'Mello commented on LUCENE-10204: --------------------------------------- I think what you say makes sense about how exposing sub query tracking could limit future optimizations to {{{}ToParentBlockJoinQuery{}}}. I think this could be resolved if we create a new class that does sub tracking as well and don't touch {{{}ToParentBlockJoinQuery{}}}, or as I mentioned earlier, expose the option to do sub tracking in the constructor. This way we can retain the current functionality (and retain the possibility of future optimization) for {{{}ToParentBlockJoinQuery{}}}. But to give some context on why I don't really like the option of evaluating the query a second time, I'm an engineer at Amazon and since we are working at a large scale, running a second query to get children will cost too much for us. In addition, we currently use these sub matches not only for faceting as was mentioned in the issue, but also for ranking models (just to provide another example of a use case). All this just to say that we want to get sub matches as fast and efficiently as possible for our use case since we pretty much use them very often. Also, I took a look at the original issue where this feature was removed (I think) (LUCENE-6959) to get an idea of why it was removed in the first place. It seemed that it used a special collector for this query which in turn required a special {{{}IndexSearcher{}}}? It doesn't seem like an issue we would run into again if we use a different implementation, but I could be overlooking something as I lack context on this. > Support iteration of sub-matches in join queries (ToParentBlockJoinQuery / > ToChildBlockJoinQuery) > ------------------------------------------------------------------------------------------------- > > Key: LUCENE-10204 > URL: https://issues.apache.org/jira/browse/LUCENE-10204 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/join > Reporter: Greg Miller > Priority: Minor > > It would be nice to be able to iterate over the "sub-matches" in these join > queries for the purpose of faceting (or possibly other use-cases?). > For example, we have a use-case where our query matches on "child" docs, > using a {{ToParentBlockJoinQuery}} to "emit" the associated parents, which > are ultimately added to our match set. But, we want to iterate over the > matching "children" for the purpose of faceting. > To make it concrete, consider searching over a product catalog where "offers" > and "items" are indexed side-by-side, with the offers being represented as > "children" of the parent items. An offer contains information like > "condition" (new vs. used), selling price, etc. for the parent item. If we > want to facet on "condition", we want to observe all children that matched > the query to know if the parent item had a "new" or "used" offer (or both). > This requires iterating over the child matches when faceting, which we cannot > do today since the child hit information isn't retained anywhere. > We can support this by "caching" the child hits in a bitset but there is some > complexity when multiple join queries appear in a query structure (would need > to logically combine various "cached" bitsets using the same boolean > operations as in the original query structure). -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org