[jira] [Commented] (LUCENE-10204) Support iteration of sub-matches in join queries (ToParentBlockJoinQuery / ToChildBlockJoinQuery)

Marc D'Mello (Jira) Fri, 01 Apr 2022 16:51:05 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-10204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516171#comment-17516171
 ]


Marc D'Mello commented on LUCENE-10204:
---------------------------------------

I think what you say makes sense about how exposing sub query tracking could 
limit future optimizations to {{{}ToParentBlockJoinQuery{}}}. I think this 
could be resolved if we create a new class that does sub tracking as well and 
don't touch {{{}ToParentBlockJoinQuery{}}}, or as I mentioned earlier, expose 
the option to do sub tracking in the constructor. This way we can retain the 
current functionality (and retain the possibility of future optimization) for 
{{{}ToParentBlockJoinQuery{}}}.

But to give some context on why I don't really like the option of evaluating 
the query a second time, I'm an engineer at Amazon and since we are working at 
a large scale, running a second query to get children will cost too much for 
us. In addition, we currently use these sub matches not only for faceting as 
was mentioned in the issue, but also for ranking models (just to provide 
another example of a use case). All this just to say that we want to get sub 
matches as fast and efficiently as possible for our use case since we pretty 
much use them very often.

Also, I took a look at the original issue where this feature was removed (I 
think) (LUCENE-6959) to get an idea of why it was removed in the first place. 
It seemed that it used a special collector for this query which in turn 
required a special {{{}IndexSearcher{}}}? It doesn't seem like an issue we 
would run into again if we use a different implementation, but I could be 
overlooking something as I lack context on this.

> Support iteration of sub-matches in join queries (ToParentBlockJoinQuery / 
> ToChildBlockJoinQuery)
> -------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-10204
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10204
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/join
>            Reporter: Greg Miller
>            Priority: Minor
>
> It would be nice to be able to iterate over the "sub-matches" in these join 
> queries for the purpose of faceting (or possibly other use-cases?).
> For example, we have a use-case where our query matches on "child" docs, 
> using a {{ToParentBlockJoinQuery}} to "emit" the associated parents, which 
> are ultimately added to our match set. But, we want to iterate over the 
> matching "children" for the purpose of faceting.
> To make it concrete, consider searching over a product catalog where "offers" 
> and "items" are indexed side-by-side, with the offers being represented as 
> "children" of the parent items. An offer contains information like 
> "condition" (new vs. used), selling price, etc. for the parent item. If we 
> want to facet on "condition", we want to observe all children that matched 
> the query to know if the parent item had a "new" or "used" offer (or both). 
> This requires iterating over the child matches when faceting, which we cannot 
> do today since the child hit information isn't retained anywhere.
> We can support this by "caching" the child hits in a bitset but there is some 
> complexity when multiple join queries appear in a query structure (would need 
> to logically combine various "cached" bitsets using the same boolean 
> operations as in the original query structure).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10204) Support iteration of sub-matches in join queries (ToParentBlockJoinQuery / ToChildBlockJoinQuery)

Reply via email to