[
https://issues.apache.org/jira/browse/LUCENE-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739479#comment-14739479
]
David Smiley commented on LUCENE-6796:
--------------------------------------
This is a fundamental limitation in how WeightedSpanTermExtractor maps
positions of SpanQueries to terms. It takes any SpanQuery tree and considers
_all_ terms it has to be valid within an entire position span match. The API
doesn't expose which underlying SpanTermQuery instances were found at the
position range. There's even more to it than that since a term might be active
for a given span position range but not necessarily at every position. For
example imagine a SpanQuery representing this, roughly: {{"foo bar" NEAR20 "foo
baz"}}. WSTE would highlight _all_ occurrences of foo, bar, and baz, _even
those standing alone not next to each other in the phrases as shown_ in a
matching span of the requisite length.
I was thinking of this problem a year ago. I believe Lucene trunk may finally
have the API needed with the new SpanCollector API thanks to [~romseygeek] --
we've conversed on the implications of this on highlighting. I anticipate
leveraging this somewhat soon (month or two?); stay tuned.
> Some terms incorrectly highlighted in complex SpanQuery
> -------------------------------------------------------
>
> Key: LUCENE-6796
> URL: https://issues.apache.org/jira/browse/LUCENE-6796
> Project: Lucene - Core
> Issue Type: Bug
> Components: modules/highlighter
> Affects Versions: 5.3
> Reporter: Tim Allison
> Priority: Trivial
> Attachments: LUCENE-6796-testcase.patch
>
>
> [~modassar] initially raised this on LUCENE-5205. I'm opening this as a
> separate issue.
> If a SpanNear is within a SpanOr, it looks like the child terms within the
> SpanNear query are getting highlighted even if there is no match on that
> SpanNear query...in some special cases. Specifically, in the format of the
> parser in LUCENE-5205 {{"(b [c z]) d\"~2"}}, which is equivalent to: find "b"
> or the phrase "c z" within two words of "d" either direction
> This affects trunk.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]