I am trying first to understand the proposed solution from the previous
thread.

You run query #1, it returns top N hits.  From those hits you ask JoinUtil
to create the "joined" query #2.  You run the query #2 to get the top final
(joined) hits.

Then, to reconstruct which docids from query #1 matched which hits from
query #2, do you run a new query for every hit out of query #2?  E.g. if
you want top 10 hits, you must run 10 new queries in the end, to match up
each docid in the final result set with each docid hit from query #1?

Mike McCandless

http://blog.mikemccandless.com


On Tue, May 12, 2020 at 12:23 PM Stefan Onofrei <stefanonof...@gmail.com>
wrote:

> Hi,
>
> When using Lucene’s query-time join feature [1], how can the hits from the
> first phase which determine / contribute to the returned results be
> retrieved?
>
> This topic has been brought up before [2], and at the time the
> recommendation was to re-run the query with added constraints based on the
> join fields values. Is there any alternative way of doing this when trying
> to get the contributing hits for every returned result and in the context
> of having multiple terms in the toField?
>
> I see that the info that is being tracked by the Join API refers to the
> scores and the terms collected in the first phase. During this feature’s
> development [3] there was also a 3-phased approach taken into
> consideration, which involved recording fromSearcher’s docIds, translating
> them into joinable terms and then recording toSearcher’s docIds. However,
> even if docId info would be recorded between phases, it would then have to
> be exposed somehow.
>
> Thanks,
> Stefan Onofrei
>
> [1]
>
> https://lucene.apache.org/core/8_5_1/join/org/apache/lucene/search/join/JoinUtil.html
> [2]
>
> https://lucene.472066.n3.nabble.com/access-to-joined-documents-td4412376.html
> [3] https://issues.apache.org/jira/browse/LUCENE-3602
>

Reply via email to