[
https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427185#comment-16427185
]
David Smiley commented on LUCENE-8229:
--------------------------------------
It's really looking great Alan. I looked over your patch a bit more....
* I wonder if "Matches" sounds too generic; perhaps "PositionMatches" to
emphasize it has position information and not simply matching document IDs?
* It's a shame that every Weight must implement this (no default impl) because
even a no-match response requires knowledge of the field. Is the distinction
important to know the field? I suppose it might be useful for figuring out
generically which fields a query references... but no not really because you
have to execute it on a matching document first to even figure that out with
this API.
* Matcher.EMPTY (a empty version of MatchesIterator) should perhaps be moved to
MatchesIterator? Come to think of it, maybe MatchesIterator could be
Matches.Iterator (inner class of Matches)? (avoids polluting the busy .search
namespace).
* RE payloads: I appreciate you want to keep things simple for now. I've heard
of putting OCR document offset information in them, for example, and a
highlighter might want this. A highlighter might want whatever metadata is
being put in a payload, even if it is relevancy oriented -- consider a
relevancy debugger tool that could show you what's in the payload. This might
not even be a "highlighter" per-se.
> Add a method to Weight to retrieve matches for a single document
> ----------------------------------------------------------------
>
> Key: LUCENE-8229
> URL: https://issues.apache.org/jira/browse/LUCENE-8229
> Project: Lucene - Core
> Issue Type: New Feature
> Reporter: Alan Woodward
> Assignee: Alan Woodward
> Priority: Major
> Attachments: LUCENE-8229.patch
>
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> The ability to find out exactly what a query has matched on is a fairly
> frequent feature request, and would also make highlighters much easier to
> implement. There have been a few attempts at doing this, including adding
> positions to Scorers, or re-writing queries as Spans, but these all either
> compromise general performance or involve up-front knowledge of all queries.
> Instead, I propose adding a method to Weight that exposes an iterator over
> matches in a particular document and field. It should be used in a similar
> manner to explain() - ie, just for TopDocs, not as part of the scoring loop,
> which relieves some of the pressure on performance.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]