While the overall framework of LUCENE-663 appears similar to the current contrib Highlighter, the code is actually quite different and I do not think it handles as many corner cases in its current state. LUCENE-663 supports PhraseQuerys by implementing 'special' search logic that inspects positional information to make sure the Tokens from a PhraseQuery are in order. I am not sure how exact this logic is compared to Lucenes PhraseQuery search logic, but a cursory look makes me think its not complete. It almost looks to me that it only does inorder with simple slop (not edit distance)...I am too lazy to check further though and I may have missed something. Also, LUCENE-663 does not support Span queries.

This patch differs in that it fits the current Highlighter framework without modifying it, and it uses Lucene's own internal search logic to identify Spans for highlighting. PhraseQueries are handled by a SpanQuery approximation.

As far as PhraseQuery/SpanQuery highlighting, I don't think any of the other Highlighter packages offer much. I think that things could be done a little faster, but that would require abandoning the current framework, and with all of the corner cases it now supports, I'd hate to see that.

The other Highlighter code that is worth consideration is LUCENE-644. It does abandon the current Highlighter framework and goes with an attack that is much more efficient for larger documents: instead of attacking the problem by spinning through all of the document tokens and comparing to query tokens, 644 just looks at the tokens from the query and grabs the original text using the offsets from those tokens. This is darn fast, but doesnt go well with positional highlighting and I wonder how well it supports all of the corner cases that arise with overlapping tokens and whatnot.

- Mark

Grant Ingersoll (JIRA) wrote:
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558784#action_12558784 ]
Grant Ingersoll commented on LUCENE-794:
----------------------------------------

How should this relate to LUCENE-663?  Seems like that one also covers other 
kinds of queries?  I'm no expert in highlighting, but it seems like there is at 
least 3 different issues in JIRA for enabling things like phrase queries, etc.  
 Should we try to consolidate these?

Extend contrib Highlighter to properly support phrase queries and span queries
------------------------------------------------------------------------------

                Key: LUCENE-794
                URL: https://issues.apache.org/jira/browse/LUCENE-794
            Project: Lucene - Java
         Issue Type: Improvement
         Components: Other
           Reporter: Mark Miller
           Priority: Minor
        Attachments: spanhighlighter.patch, spanhighlighter10.patch, 
spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, 
spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
spanhighlighter_patch_4.zip


This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package 
that scores just like QueryScorer, but scores a 0 for Terms that did not cause 
the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys 
and PhraseQuery. There is also a new Fragmenter that attempts to fragment 
without breaking up Spans.
See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
There is a dependency on MemoryIndex.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to