[ 
https://issues.apache.org/jira/browse/LUCENE-5181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749363#comment-13749363
 ] 

Jon Stewart commented on LUCENE-5181:
-------------------------------------

Sure. I'm working in a high recall/low precision domain, where a large portion 
of the source documents are irrelevant junk. For their review, users are often 
presented with a match-oriented table view rather than a document-oriented 
table view, i.e., each row in the table represents a term match, generally with 
some context, and is joined with some document metadata.

I can use the PassageFormatter to get access to the Passages in a result set, 
but it is hard to generate this table view without knowing which Document goes 
with the Passage. Additionally, a research problem I'm working on is using a 
combination of match properties and Document properties to score the individual 
matches (including metadata, like file type, created dates, etc.). The 
properties get normalized and fed into liblinear and out comes a score for us 
to sort on. This, too, is difficult without having the Document.

Happy to contribute a patch if there's consensus. Passing in the docID via 
PassageFormatter.format is what I did, but that breaks backwards compatibility. 
It'd be easy enough to set on Passage as a field.
                
> Passage knows its own docID
> ---------------------------
>
>                 Key: LUCENE-5181
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5181
>             Project: Lucene - Core
>          Issue Type: Improvement
>    Affects Versions: 4.4
>            Reporter: Jon Stewart
>            Priority: Minor
>
> The new PostingsHighlight package allows for retrieval of term matches from a 
> query if one creates a class that extends PassageFormatter and overrides 
> format(). However, class Passage does not have a docID field, nor is this 
> provided via PassageFormatter.format(). Therefore, it's very difficult to 
> know which Document contains a given Passage.
> It would suffice for PassageFormatter.format() to be passed the docID as a 
> parameter. From the code in PostingsHighlight, this seems like it would be 
> easy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to