[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document

David Smiley (JIRA) Wed, 28 Mar 2018 11:03:39 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16417849#comment-16417849
 ]


David Smiley commented on LUCENE-8229:
--------------------------------------

This is really interesting [~romseygeek]!

Here's your proposed signature: {{public MatchesIterator 
matches(LeafReaderContext context, int doc, String field) throws IOException}}

* I'm unsure about this new matches method requiring a field reference, thus 
insisting all fields in the query match the field in this argument.  A caller 
might want all fields, or perhaps just some.  This could easily be converted to 
a Predicate<String> to match the field.
* Add payloads to {{MatchesIterator}}
* Perhaps {{matches}} should take an int for the PostingsEnum flags.  This way 
it could choose to ask for offsets and/or payloads.  Or maybe just always get 
both to keep the API simpler, assuming the perf difference is negligible for 
practical uses of this feature (which sounds plausible to me).  It could be 
added later if desired.  Yeah, lets not now then.

Have you considered a very different approach of modifying Scorer to expose 
more information about the matches in a document?  I'm just thinking out-loud 
here; might be a bad idea ;-).  Maybe I'm saying the same thing as "adding 
positions to Scorers" as you reference in the description, but maybe it could 
hang off indirectly using the {{MatchesIterator}} you developed here.  Your 
proposed {{Weight.matches(...)}} is a visitor-like thing and we already have 
Scorer doing that.  Lots of Weight classes to be modified; I wonder if it's 
less invasive at the Scorer?  Hmm.


> Add a method to Weight to retrieve matches for a single document
> ----------------------------------------------------------------
>
>                 Key: LUCENE-8229
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8229
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The ability to find out exactly what a query has matched on is a fairly 
> frequent feature request, and would also make highlighters much easier to 
> implement.  There have been a few attempts at doing this, including adding 
> positions to Scorers, or re-writing queries as Spans, but these all either 
> compromise general performance or involve up-front knowledge of all queries.
> Instead, I propose adding a method to Weight that exposes an iterator over 
> matches in a particular document and field.  It should be used in a similar 
> manner to explain() - ie, just for TopDocs, not as part of the scoring loop, 
> which relieves some of the pressure on performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document

Reply via email to