Actually, I do not see how this can work efficiently with per-hit queries after the join.
For each of the final joined hits, you must 1) retrieve the join key value(s) by pulling doc values iterators and advancing to the right docid, 2) run another query to "join backwards" to the hits from the left side of the join. I don't see how step 2) can work efficiently when there are many possible hits on the left side that might have matched those join keys? Elasticsearch offers query time joins ... I wonder how it retrieves and returns hits from both left and right? It seems like the left side of the join must retain some state, to know which top hits corresponded to those join values, and then add an API to retrieve them? Mike McCandless http://blog.mikemccandless.com On Wed, May 20, 2020 at 6:31 PM Michael McCandless < luc...@mikemccandless.com> wrote: > I am trying first to understand the proposed solution from the previous > thread. > > You run query #1, it returns top N hits. From those hits you ask JoinUtil > to create the "joined" query #2. You run the query #2 to get the top final > (joined) hits. > > Then, to reconstruct which docids from query #1 matched which hits from > query #2, do you run a new query for every hit out of query #2? E.g. if > you want top 10 hits, you must run 10 new queries in the end, to match up > each docid in the final result set with each docid hit from query #1? > > Mike McCandless > > http://blog.mikemccandless.com > > > On Tue, May 12, 2020 at 12:23 PM Stefan Onofrei <stefanonof...@gmail.com> > wrote: > >> Hi, >> >> When using Lucene’s query-time join feature [1], how can the hits from the >> first phase which determine / contribute to the returned results be >> retrieved? >> >> This topic has been brought up before [2], and at the time the >> recommendation was to re-run the query with added constraints based on the >> join fields values. Is there any alternative way of doing this when trying >> to get the contributing hits for every returned result and in the context >> of having multiple terms in the toField? >> >> I see that the info that is being tracked by the Join API refers to the >> scores and the terms collected in the first phase. During this feature’s >> development [3] there was also a 3-phased approach taken into >> consideration, which involved recording fromSearcher’s docIds, translating >> them into joinable terms and then recording toSearcher’s docIds. However, >> even if docId info would be recorded between phases, it would then have to >> be exposed somehow. >> >> Thanks, >> Stefan Onofrei >> >> [1] >> >> https://lucene.apache.org/core/8_5_1/join/org/apache/lucene/search/join/JoinUtil.html >> [2] >> >> https://lucene.472066.n3.nabble.com/access-to-joined-documents-td4412376.html >> [3] https://issues.apache.org/jira/browse/LUCENE-3602 >> >