I think the collector approach is perfectly fine for mass-processing of queries.

By the way: Elasticserach/Opensearch have a feature already built-in and it is working based on collector API in a similar way like you mentioned (as far as I remember). It is a bit different as you can tag any clause in a BQ (so every query) using a "name" (they call it "named query", https://www.elastic.co/guide/en/elasticsearch/reference/8.2/query-dsl-bool-query.html#named-queries). When you get the search results, for each hit it tells you which named queries were a match on the hit. The actual implementation is some wrapper query on each of those clauses that contains the name. In hit collection it just collects all named query instances found in query tree. I think their implementation somehow the wrapper query scorer impl adds the name to some global state.

Uwe

Am 27.06.2022 um 11:51 schrieb Shai Erera:
Out of curiosity and for education purposes, is the Collector approach I proposed wrong/inefficient? Or less efficient than the matches() API?

I'm thinking, if you want to both match/rank documents and as a side effect know which fields matched, the Collector will perform better than Weight.matches(), but I could be wrong.

Shai

On Mon, Jun 27, 2022 at 11:57 AM Dawid Weiss <dawid.we...@gmail.com> wrote:

    The matches API is awesome. Use it. You can also get a rough glimpse
    into a superset of fields potentially matching the query via:

        query.visit(
            new QueryVisitor() {
              @Override
              public boolean acceptField(String field) {
                affectedFields.add(field);
                return false;
              }
            });

    
https://lucene.apache.org/core/9_2_0/core/org/apache/lucene/search/Query.html#visit(org.apache.lucene.search.QueryVisitor)

    I'd go with the Matches API though.

    Dawid

    On Mon, Jun 27, 2022 at 10:48 AM Alan Woodward
    <romseyg...@gmail.com> wrote:
    >
    > The Matches API will give you this information - it’s still
    likely to be fairly slow, but it’s a lot easier to use than trying
    to parse Explain output.
    >
    > Query q = ….;
    > Weight w = searcher.createWeight(searcher.rewrite(query),
    ScoreMode.COMPLETE_NO_SCORES, 1.0f);
    >
    > Matches m = w.matches(context, doc);
    > List<String> matchingFields = new ArrayList();
    > for (String field : m) {
    >  matchingFields.add(field);
    > }
    >
    > Bear in mind that `matches` doesn’t maintain any state between
    calls, so calling it for every matching document is likely to be
    slow; for those cases Shai’s suggestion of using a Collector and
    examining low-level scorers will perform better, but it won’t work
    for every query type.
    >
    >
    > > On 25 Jun 2022, at 04:14, Yichen Sun <yiche...@bu.edu> wrote:
    > >
    > > Hello!
    > >
    > > I’m a MSCS student from BU and learning to use Lucene.
    Recently I try to output matched fields by one query. For example,
    for one document, there are 10 fields and 2 of them match the
    query. I want to get the name of these fields.
    > >
    > > I have tried using explain() method and getting description
    then regex. However it cost so much time.
    > >
    > > I wonder what is the efficient way to get the matched fields.
    Would you please offer some help? Thank you so much!
    > >
    > > Best regards,
    > > Yichen Sun
    >
    >
    >
    ---------------------------------------------------------------------
    > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
    > For additional commands, e-mail: dev-h...@lucene.apache.org
    >

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
    For additional commands, e-mail: dev-h...@lucene.apache.org

--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:u...@thetaphi.de

Reply via email to