I think the collector approach is perfectly fine for mass-processing of
queries.
By the way: Elasticserach/Opensearch have a feature already built-in and
it is working based on collector API in a similar way like you mentioned
(as far as I remember). It is a bit different as you can tag any clause
in a BQ (so every query) using a "name" (they call it "named query",
https://www.elastic.co/guide/en/elasticsearch/reference/8.2/query-dsl-bool-query.html#named-queries).
When you get the search results, for each hit it tells you which named
queries were a match on the hit. The actual implementation is some
wrapper query on each of those clauses that contains the name. In hit
collection it just collects all named query instances found in query
tree. I think their implementation somehow the wrapper query scorer impl
adds the name to some global state.
Uwe
Am 27.06.2022 um 11:51 schrieb Shai Erera:
Out of curiosity and for education purposes, is the Collector approach
I proposed wrong/inefficient? Or less efficient than the matches() API?
I'm thinking, if you want to both match/rank documents and as a side
effect know which fields matched, the Collector will perform better
than Weight.matches(), but I could be wrong.
Shai
On Mon, Jun 27, 2022 at 11:57 AM Dawid Weiss <dawid.we...@gmail.com>
wrote:
The matches API is awesome. Use it. You can also get a rough glimpse
into a superset of fields potentially matching the query via:
query.visit(
new QueryVisitor() {
@Override
public boolean acceptField(String field) {
affectedFields.add(field);
return false;
}
});
https://lucene.apache.org/core/9_2_0/core/org/apache/lucene/search/Query.html#visit(org.apache.lucene.search.QueryVisitor)
I'd go with the Matches API though.
Dawid
On Mon, Jun 27, 2022 at 10:48 AM Alan Woodward
<romseyg...@gmail.com> wrote:
>
> The Matches API will give you this information - it’s still
likely to be fairly slow, but it’s a lot easier to use than trying
to parse Explain output.
>
> Query q = ….;
> Weight w = searcher.createWeight(searcher.rewrite(query),
ScoreMode.COMPLETE_NO_SCORES, 1.0f);
>
> Matches m = w.matches(context, doc);
> List<String> matchingFields = new ArrayList();
> for (String field : m) {
> matchingFields.add(field);
> }
>
> Bear in mind that `matches` doesn’t maintain any state between
calls, so calling it for every matching document is likely to be
slow; for those cases Shai’s suggestion of using a Collector and
examining low-level scorers will perform better, but it won’t work
for every query type.
>
>
> > On 25 Jun 2022, at 04:14, Yichen Sun <yiche...@bu.edu> wrote:
> >
> > Hello!
> >
> > I’m a MSCS student from BU and learning to use Lucene.
Recently I try to output matched fields by one query. For example,
for one document, there are 10 fields and 2 of them match the
query. I want to get the name of these fields.
> >
> > I have tried using explain() method and getting description
then regex. However it cost so much time.
> >
> > I wonder what is the efficient way to get the matched fields.
Would you please offer some help? Thank you so much!
> >
> > Best regards,
> > Yichen Sun
>
>
>
---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:u...@thetaphi.de