[
https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16417849#comment-16417849
]
David Smiley commented on LUCENE-8229:
--------------------------------------
This is really interesting [~romseygeek]!
Here's your proposed signature: {{public MatchesIterator
matches(LeafReaderContext context, int doc, String field) throws IOException}}
* I'm unsure about this new matches method requiring a field reference, thus
insisting all fields in the query match the field in this argument. A caller
might want all fields, or perhaps just some. This could easily be converted to
a Predicate<String> to match the field.
* Add payloads to {{MatchesIterator}}
* Perhaps {{matches}} should take an int for the PostingsEnum flags. This way
it could choose to ask for offsets and/or payloads. Or maybe just always get
both to keep the API simpler, assuming the perf difference is negligible for
practical uses of this feature (which sounds plausible to me). It could be
added later if desired. Yeah, lets not now then.
Have you considered a very different approach of modifying Scorer to expose
more information about the matches in a document? I'm just thinking out-loud
here; might be a bad idea ;-). Maybe I'm saying the same thing as "adding
positions to Scorers" as you reference in the description, but maybe it could
hang off indirectly using the {{MatchesIterator}} you developed here. Your
proposed {{Weight.matches(...)}} is a visitor-like thing and we already have
Scorer doing that. Lots of Weight classes to be modified; I wonder if it's
less invasive at the Scorer? Hmm.
> Add a method to Weight to retrieve matches for a single document
> ----------------------------------------------------------------
>
> Key: LUCENE-8229
> URL: https://issues.apache.org/jira/browse/LUCENE-8229
> Project: Lucene - Core
> Issue Type: New Feature
> Reporter: Alan Woodward
> Assignee: Alan Woodward
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> The ability to find out exactly what a query has matched on is a fairly
> frequent feature request, and would also make highlighters much easier to
> implement. There have been a few attempts at doing this, including adding
> positions to Scorers, or re-writing queries as Spans, but these all either
> compromise general performance or involve up-front knowledge of all queries.
> Instead, I propose adding a method to Weight that exposes an iterator over
> matches in a particular document and field. It should be used in a similar
> manner to explain() - ie, just for TopDocs, not as part of the scoring loop,
> which relieves some of the pressure on performance.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]