[
https://issues.apache.org/jira/browse/LUCENE-403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599678#action_12599678
]
Mark Miller commented on LUCENE-403:
------------------------------------
I would say we do have all of the functionality of this patch +. I have not
checked how well this handles all of the corner cases, but it looks like Mark H
did a bit of that. I would say it currently offers no functional value
though...but it may be faster than what we have for PhraseQuery's (it does not
support Spans). The patch uses the offsets from the TokenStream for
highlighting and just makes sure PhraseQuery's terms are next to each other
(not sure how exact this emulates slop), so this can be rather fast on larger
docs.
I analyzed all of the old Highlight code in JIRA when considering how best to
do the SpanScorer, and passed on them for one reason or another. The main pass
on this was the lack of Span support, loss of current highlighter features/api,
pseudo duplicating Lucene phrase query searching in the Highlighter code. I
think a solution that doesn't duplicate Query code is much cleaner.
So I don't think this is very useful in regards to the general Highlighter. The
idea of using Token offset info to do the Highlighting was also tried in
Ronnie's JIRA issue (though in that case it was done through TermVectors and
not from the TokenStream), and while it proves to be faster on large documents,
it doesn't appear easy to retain the speed when working with Spans, and it
doesn't fit well with the old API.
Should we ditch the old API some day though, I have been playing around with
this technique with my LargeDocHighlighter, and I still have hope that will go
somewhere. I just don't see the old token scoring API being thrown away in the
near future.
> Alternate Lucene Query Highlighter
> ----------------------------------
>
> Key: LUCENE-403
> URL: https://issues.apache.org/jira/browse/LUCENE-403
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Other
> Affects Versions: 1.4
> Environment: Operating System: All
> Platform: All
> Reporter: David Bohl
> Priority: Minor
> Attachments: HighlighterTest.java, HighlighterTest.java,
> QueryHighlighter.java, QueryHighlighter.java, QueryHighlighter.java,
> QuerySpansExtractor.java
>
>
> I created a lucene query highlighter (borrowing some code from the one in
> the sandbox) that my company is using. It better handles phrase queries,
> doesn't break HTML entities, and has the ability to either highlight terms
> in an entire document or to highlight fragments from the document. I would
> like to make it available to anyone who wants it.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]