I am trying first to understand the proposed solution from the previous thread.
You run query #1, it returns top N hits. From those hits you ask JoinUtil to create the "joined" query #2. You run the query #2 to get the top final (joined) hits. Then, to reconstruct which docids from query #1 matched which hits from query #2, do you run a new query for every hit out of query #2? E.g. if you want top 10 hits, you must run 10 new queries in the end, to match up each docid in the final result set with each docid hit from query #1? Mike McCandless http://blog.mikemccandless.com On Tue, May 12, 2020 at 12:23 PM Stefan Onofrei <stefanonof...@gmail.com> wrote: > Hi, > > When using Lucene’s query-time join feature [1], how can the hits from the > first phase which determine / contribute to the returned results be > retrieved? > > This topic has been brought up before [2], and at the time the > recommendation was to re-run the query with added constraints based on the > join fields values. Is there any alternative way of doing this when trying > to get the contributing hits for every returned result and in the context > of having multiple terms in the toField? > > I see that the info that is being tracked by the Join API refers to the > scores and the terms collected in the first phase. During this feature’s > development [3] there was also a 3-phased approach taken into > consideration, which involved recording fromSearcher’s docIds, translating > them into joinable terms and then recording toSearcher’s docIds. However, > even if docId info would be recorded between phases, it would then have to > be exposed somehow. > > Thanks, > Stefan Onofrei > > [1] > > https://lucene.apache.org/core/8_5_1/join/org/apache/lucene/search/join/JoinUtil.html > [2] > > https://lucene.472066.n3.nabble.com/access-to-joined-documents-td4412376.html > [3] https://issues.apache.org/jira/browse/LUCENE-3602 >