Ryan Lauck created LUCENE-4734:
----------------------------------

             Summary: FastVectorHighlighter Overlapping Proximity Queries Do 
Not Highlight
                 Key: LUCENE-4734
                 URL: https://issues.apache.org/jira/browse/LUCENE-4734
             Project: Lucene - Core
          Issue Type: Bug
          Components: modules/highlighter
    Affects Versions: 4.1, 4.0
            Reporter: Ryan Lauck


If a proximity phrase query overlaps with any other query term it will not be 
highlighted.

Example Text:  A B C D E F G

Example Queries: 

"B E"~10 D
(only D will be highlighted)

"B E"~10 "C F"~10
(neither phrase will be highlighted)


This can be traced to the FieldPhraseList constructor's inner while loop. From 
the first example query, the first TermInfo popped off the stack will be "B". 
The second TermInfo will be "D" which will not be found in the submap for "B 
E"~10 and will trigger a failed match.

I wanted to report this issue before digging into a solution but my first 
thought is:

Add an additional int property to QueryPhraseMap to store the maximum possible 
phrase width for each term based on any proximity searches it is part of 
(defaulting to zero, in the above examples it would be 10). 

If a term is popped off the stack that is not a part of a proximity phrase 
being matched ( currMap.getTermMap(ti.getText()) == null ), it is added to a 
temporary list until either the longest possible phrase is successfully matched 
or a term is found outside the maximum possible phrase width.

After this search is complete, any non-matching terms that were added to the 
temporary list are pushed back onto the stack to be evaluated again and the 
temp list is cleared.

Hopefully this makes sense, and if nobody sees any obvious flaws I may try to 
create a patch.




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to