Ryan Lauck created LUCENE-4734:
----------------------------------
Summary: FastVectorHighlighter Overlapping Proximity Queries Do
Not Highlight
Key: LUCENE-4734
URL: https://issues.apache.org/jira/browse/LUCENE-4734
Project: Lucene - Core
Issue Type: Bug
Components: modules/highlighter
Affects Versions: 4.1, 4.0
Reporter: Ryan Lauck
If a proximity phrase query overlaps with any other query term it will not be
highlighted.
Example Text: A B C D E F G
Example Queries:
"B E"~10 D
(only D will be highlighted)
"B E"~10 "C F"~10
(neither phrase will be highlighted)
This can be traced to the FieldPhraseList constructor's inner while loop. From
the first example query, the first TermInfo popped off the stack will be "B".
The second TermInfo will be "D" which will not be found in the submap for "B
E"~10 and will trigger a failed match.
I wanted to report this issue before digging into a solution but my first
thought is:
Add an additional int property to QueryPhraseMap to store the maximum possible
phrase width for each term based on any proximity searches it is part of
(defaulting to zero, in the above examples it would be 10).
If a term is popped off the stack that is not a part of a proximity phrase
being matched ( currMap.getTermMap(ti.getText()) == null ), it is added to a
temporary list until either the longest possible phrase is successfully matched
or a term is found outside the maximum possible phrase width.
After this search is complete, any non-matching terms that were added to the
temporary list are pushed back onto the stack to be evaluated again and the
temp list is cleared.
Hopefully this makes sense, and if nobody sees any obvious flaws I may try to
create a patch.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]