LargeDocHighlighter - another span highlighter optimized for large documents
----------------------------------------------------------------------------
Key: LUCENE-1286
URL: https://issues.apache.org/jira/browse/LUCENE-1286
Project: Lucene - Java
Issue Type: Improvement
Components: contrib/highlighter
Affects Versions: 2.4
Reporter: Mark Miller
The existing Highlighter API is rich and well designed, but the approach taken
is not very efficient for large documents.
I believe that this is because the current Highlighter rebuilds the document by
running through and scoring every every token in the tokenstream.
With a break in the current API, an alternate approach can be taken: rebuild
the document by running through the query terms by using their offsets. The
benefit is clear - a large doc will have a large tokenstream, but a query will
likely be very small in comparison.
I expect this approach to be quite a bit faster for very large documents, while
still supporting Phrase and Span queries.
First rough patch to follow shortly.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]