[ 
https://issues.apache.org/jira/browse/LUCENE-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739479#comment-14739479
 ] 

David Smiley commented on LUCENE-6796:
--------------------------------------

This is a fundamental limitation in how WeightedSpanTermExtractor maps 
positions of SpanQueries to terms.  It takes any SpanQuery tree and considers 
_all_ terms it has to be valid within an entire position span match.  The API 
doesn't expose which underlying SpanTermQuery instances were found at the 
position range.  There's even more to it than that since a term might be active 
for a given span position range but not necessarily at every position.  For 
example imagine a SpanQuery representing this, roughly: {{"foo bar" NEAR20 "foo 
baz"}}.  WSTE would highlight _all_ occurrences of foo, bar, and baz, _even 
those standing alone not next to each other in the phrases as shown_ in a 
matching span of the requisite length.

I was thinking of this problem a year ago.  I believe Lucene trunk may finally 
have the API needed with the new SpanCollector API thanks to [~romseygeek] -- 
we've conversed on the implications of this on highlighting.  I anticipate 
leveraging this somewhat soon (month or two?); stay tuned.

> Some terms incorrectly highlighted in complex SpanQuery
> -------------------------------------------------------
>
>                 Key: LUCENE-6796
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6796
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/highlighter
>    Affects Versions: 5.3
>            Reporter: Tim Allison
>            Priority: Trivial
>         Attachments: LUCENE-6796-testcase.patch
>
>
> [~modassar] initially raised this on LUCENE-5205.  I'm opening this as a 
> separate issue.
> If a SpanNear is within a SpanOr, it looks like the child terms within the 
> SpanNear query are getting highlighted even if there is no match on that 
> SpanNear query...in some special cases.  Specifically, in the format of the 
> parser in LUCENE-5205 {{"(b [c z]) d\"~2"}}, which is equivalent to: find "b" 
> or the phrase "c z" within two words of "d" either direction
> This affects trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to