[ 
https://issues.apache.org/jira/browse/LUCENE-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712427#comment-13712427
 ] 

Ryan Lauck commented on LUCENE-4734:
------------------------------------

Thanks Adrien!

I agree about LUCENE-2878. I came to the same conclusion before finding that 
someone had already done most of the work that the ideal scenario is to 
(optionally) pull postings or term vectors in addition to payloads while 
scoring and expose them for highlighting. I'm looking forward to that patch too!

An idea I began working on but haven't polished enough to submit a patch for:

Users of the API could access raw highlight metadata (offsets and positions) 
and could additionally process to merge/filter/ignore overlapping highlights - 
one flaw I've had to work around in existing highlighters is that when 
highlights overlap they either merge them or toss all but the first 
encountered. We perform the highlighting manually in our system and hope to one 
day allow end users to toggle which terms are highlighted without having to 
make round-trips to the server to modify the search criteria and rerun the 
highlighter. With raw offset data this is trivial and merging/discarding 
overlaps can be handled in client-side code. There are additional advantages 
too such as being able to highlight find-in-page or search-within-search 
results and only having to transfer new offset metadata rather than the entire 
text over the wire (we have some very big 100MB+ documents).
                
> FastVectorHighlighter Overlapping Proximity Queries Do Not Highlight
> --------------------------------------------------------------------
>
>                 Key: LUCENE-4734
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4734
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/highlighter
>    Affects Versions: 4.0, 4.1, 5.0
>            Reporter: Ryan Lauck
>              Labels: fastvectorhighlighter, highlighter
>             Fix For: 4.4
>
>         Attachments: lucene-4734.patch, LUCENE-4734.patch
>
>
> If a proximity phrase query overlaps with any other query term it will not be 
> highlighted.
> Example Text:  A B C D E F G
> Example Queries: 
> "B E"~10 D
> (D will be highlighted instead of "B C D E")
> "B E"~10 "C F"~10
> (nothing will be highlighted)
> This can be traced to the FieldPhraseList constructor's inner while loop. 
> From the first example query, the first TermInfo popped off the stack will be 
> "B". The second TermInfo will be "D" which will not be found in the submap 
> for "B E"~10 and will trigger a failed match.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to