LargeDocHighlighter - another span highlighter optimized for large documents ----------------------------------------------------------------------------
Key: LUCENE-1286 URL: https://issues.apache.org/jira/browse/LUCENE-1286 Project: Lucene - Java Issue Type: Improvement Components: contrib/highlighter Affects Versions: 2.4 Reporter: Mark Miller The existing Highlighter API is rich and well designed, but the approach taken is not very efficient for large documents. I believe that this is because the current Highlighter rebuilds the document by running through and scoring every every token in the tokenstream. With a break in the current API, an alternate approach can be taken: rebuild the document by running through the query terms by using their offsets. The benefit is clear - a large doc will have a large tokenstream, but a query will likely be very small in comparison. I expect this approach to be quite a bit faster for very large documents, while still supporting Phrase and Span queries. First rough patch to follow shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]