[
https://issues.apache.org/jira/browse/LUCENE-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Lauck updated LUCENE-4734:
-------------------------------
Fix Version/s: 5.0
4.2
Description:
If a proximity phrase query overlaps with any other query term it will not be
highlighted.
Example Text: A B C D E F G
Example Queries:
"B E"~10 D
(D will be highlighted instead of "B C D E")
"B E"~10 "C F"~10
(nothing will be highlighted)
This can be traced to the FieldPhraseList constructor's inner while loop. From
the first example query, the first TermInfo popped off the stack will be "B".
The second TermInfo will be "D" which will not be found in the submap for "B
E"~10 and will trigger a failed match.
was:
If a proximity phrase query overlaps with any other query term it will not be
highlighted.
Example Text: A B C D E F G
Example Queries:
"B E"~10 D
(only D will be highlighted)
"B E"~10 "C F"~10
(neither phrase will be highlighted)
This can be traced to the FieldPhraseList constructor's inner while loop. From
the first example query, the first TermInfo popped off the stack will be "B".
The second TermInfo will be "D" which will not be found in the submap for "B
E"~10 and will trigger a failed match.
I wanted to report this issue before digging into a solution but my first
thought is:
Add an additional int property to QueryPhraseMap to store the maximum possible
phrase width for each term based on any proximity searches it is part of
(defaulting to zero, in the above examples it would be 10).
If a term is popped off the stack that is not a part of a proximity phrase
being matched ( currMap.getTermMap(ti.getText()) == null ), it is added to a
temporary list until either the longest possible phrase is successfully matched
or a term is found outside the maximum possible phrase width.
After this search is complete, any non-matching terms that were added to the
temporary list are pushed back onto the stack to be evaluated again and the
temp list is cleared.
Hopefully this makes sense, and if nobody sees any obvious flaws I may try to
create a patch.
Lucene Fields: New,Patch Available (was: New)
Affects Version/s: 5.0
> FastVectorHighlighter Overlapping Proximity Queries Do Not Highlight
> --------------------------------------------------------------------
>
> Key: LUCENE-4734
> URL: https://issues.apache.org/jira/browse/LUCENE-4734
> Project: Lucene - Core
> Issue Type: Bug
> Components: modules/highlighter
> Affects Versions: 4.0, 4.1, 5.0
> Reporter: Ryan Lauck
> Labels: fastvectorhighlighter, highlighter
> Fix For: 4.2, 5.0
>
> Attachments: lucene-fvh-slop.patch
>
>
> If a proximity phrase query overlaps with any other query term it will not be
> highlighted.
> Example Text: A B C D E F G
> Example Queries:
> "B E"~10 D
> (D will be highlighted instead of "B C D E")
> "B E"~10 "C F"~10
> (nothing will be highlighted)
> This can be traced to the FieldPhraseList constructor's inner while loop.
> From the first example query, the first TermInfo popped off the stack will be
> "B". The second TermInfo will be "D" which will not be found in the submap
> for "B E"~10 and will trigger a failed match.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]