[
https://issues.apache.org/jira/browse/LUCENE-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557800#action_12557800
]
Grant Ingersoll commented on LUCENE-644:
----------------------------------------
Is this still an issue? Does this speedup still apply?
> Contrib: another highlighter approach
> -------------------------------------
>
> Key: LUCENE-644
> URL: https://issues.apache.org/jira/browse/LUCENE-644
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Other
> Reporter: Ronnie Kolehmainen
> Priority: Minor
> Attachments: FulltextHighlighter.java, FulltextHighlighter.java,
> FulltextHighlighterTest.java, FulltextHighlighterTest.java, svn-diff.patch,
> svn-diff.patch, TokenSources.java, TokenSources.java.diff
>
>
> Mark Harwoods highlighter package is a great contribution to Lucene, I've
> used it a lot! However, when you have *large* documents (fields),
> highlighting can be quite time consuming if you increase the number of bytes
> to analyze with setMaxDocBytesToAnalyze(int). The default value of 50k is
> often too low for indexed PDFs etcetera, which results in empty highlight
> strings.
> This is an alternative approach using term position vectors only to build
> fragment info objects. Then a StringReader can read the relevant fragments
> and skip() between them. This is a lot faster. Also, this method uses the
> *entire* field for finding the best fragments so you're always guaranteed to
> get a highlight snippet.
> Because this method only works with fields which have term positions stored
> one can check if this method works for a particular field using following
> code (taken from TokenSources.java):
> TermFreqVector tfv = (TermFreqVector) reader.getTermFreqVector(docId,
> field);
> if (tfv != null && tfv instanceof TermPositionVector)
> {
> // use FulltextHighlighter
> }
> else
> {
> // use standard Highlighter
> }
> Someone else might find this useful so I'm posting the code here.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]