[jira] Commented: (LUCENE-403) Alternate Lucene Query Highlighter

Mark Miller (JIRA) Sun, 25 May 2008 05:08:21 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599678#action_12599678
 ]


Mark Miller commented on LUCENE-403:
------------------------------------

I would say we do have all of the functionality of this patch +. I have not 
checked how well this handles all of the corner cases, but it looks like Mark H 
did a bit of that. I would say it currently offers no functional value 
though...but it may be faster than what we have for PhraseQuery's (it does not 
support Spans). The patch uses the offsets from the TokenStream for 
highlighting and just makes sure PhraseQuery's terms are next to each other 
(not sure how exact this emulates slop), so this can be rather fast on larger 
docs.

I analyzed all of the old Highlight code in JIRA when considering how best to 
do the SpanScorer, and passed on them for one reason or another. The main pass 
on this was the lack of Span support, loss of current highlighter features/api, 
pseudo duplicating Lucene phrase query searching in the Highlighter code. I 
think a solution that doesn't duplicate Query code is much cleaner.

So I don't think this is very useful in regards to the general Highlighter. The 
idea of using Token offset info to do the Highlighting was also tried in 
Ronnie's JIRA issue (though in that case it was done through TermVectors and 
not from the TokenStream), and while it proves to be faster on large documents, 
it doesn't appear easy to retain the speed when working with Spans, and it 
doesn't fit well with the old API.

Should we ditch the old API some day though, I have been playing around with 
this technique with my LargeDocHighlighter, and I still have hope that will go 
somewhere. I just don't see the old token scoring API being thrown away in the 
near future.



> Alternate Lucene Query Highlighter
> ----------------------------------
>
>                 Key: LUCENE-403
>                 URL: https://issues.apache.org/jira/browse/LUCENE-403
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>    Affects Versions: 1.4
>         Environment: Operating System: All
> Platform: All
>            Reporter: David Bohl
>            Priority: Minor
>         Attachments: HighlighterTest.java, HighlighterTest.java, 
> QueryHighlighter.java, QueryHighlighter.java, QueryHighlighter.java, 
> QuerySpansExtractor.java
>
>
> I created a lucene query highlighter (borrowing some code from the one in
> the sandbox) that my company is using.  It better handles phrase queries,
> doesn't break HTML entities, and has the ability to either highlight terms
> in an entire document or to highlight fragments from the document.  I would 
> like to make it available to anyone who wants it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-403) Alternate Lucene Query Highlighter

Reply via email to