[ https://issues.apache.org/jira/browse/LUCENE-5181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749363#comment-13749363 ]
Jon Stewart commented on LUCENE-5181: ------------------------------------- Sure. I'm working in a high recall/low precision domain, where a large portion of the source documents are irrelevant junk. For their review, users are often presented with a match-oriented table view rather than a document-oriented table view, i.e., each row in the table represents a term match, generally with some context, and is joined with some document metadata. I can use the PassageFormatter to get access to the Passages in a result set, but it is hard to generate this table view without knowing which Document goes with the Passage. Additionally, a research problem I'm working on is using a combination of match properties and Document properties to score the individual matches (including metadata, like file type, created dates, etc.). The properties get normalized and fed into liblinear and out comes a score for us to sort on. This, too, is difficult without having the Document. Happy to contribute a patch if there's consensus. Passing in the docID via PassageFormatter.format is what I did, but that breaks backwards compatibility. It'd be easy enough to set on Passage as a field. > Passage knows its own docID > --------------------------- > > Key: LUCENE-5181 > URL: https://issues.apache.org/jira/browse/LUCENE-5181 > Project: Lucene - Core > Issue Type: Improvement > Affects Versions: 4.4 > Reporter: Jon Stewart > Priority: Minor > > The new PostingsHighlight package allows for retrieval of term matches from a > query if one creates a class that extends PassageFormatter and overrides > format(). However, class Passage does not have a docID field, nor is this > provided via PassageFormatter.format(). Therefore, it's very difficult to > know which Document contains a given Passage. > It would suffice for PassageFormatter.format() to be passed the docID as a > parameter. From the code in PostingsHighlight, this seems like it would be > easy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org