[
https://issues.apache.org/jira/browse/LUCENE-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yuanyun.cn updated LUCENE-5381:
-------------------------------
Attachment: LUCENE-5381.patch
> Lucene highlighter doesn't honor hl.fragsize; it appends all text for last
> fragment
> -----------------------------------------------------------------------------------
>
> Key: LUCENE-5381
> URL: https://issues.apache.org/jira/browse/LUCENE-5381
> Project: Lucene - Core
> Issue Type: Bug
> Components: modules/highlighter
> Affects Versions: 4.0, 4.6
> Reporter: yuanyun.cn
> Priority: Minor
> Labels: highlighter, lucene
> Fix For: 5.0, 4.7
>
> Attachments: LUCENE-5381.patch
>
> Original Estimate: 4h
> Remaining Estimate: 4h
>
> Recently, we hit a problem related with highlighter: I set hl.fragsize = 300,
> but the highlight section for one document oupputs more than 2000 characters.
> Look into the code, in
> org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(TokenStream,
> String, boolean, int), after the for loop, it appends whole remaining text
> into last fragment.
> if (
> // if there is text beyond the last token considered..
> (lastEndOffset < text.length())
> &&
> // and that text is not too large...
> (text.length()<= maxDocCharsToAnalyze)
> )
> {
> //append it to the last fragment
> newText.append(encoder.encodeText(text.substring(lastEndOffset)));
> }
> currentFrag.textEndPos = newText.length();
> This code is problematical, as in some cases, the last fragment is the most
> relevant section and will be selected to return to client.
> I made some change to the code like below: It seems work for me :)
> //Test what remains of the original text beyond the point where we stopped
> analyzing
> if(lastEndOffset < text.length())
> {
> if(textFragmenter instanceof SimpleFragmenter)
> {
> SimpleFragmenter simpleFragmenter = (SimpleFragmenter)
> textFragmenter;
> int remain =simpleFragmenter.getFragmentSize()
> -(newText.length() - currentFrag.textStartPos);
> if(remain > 0 )
> {
> int endIndex = lastEndOffset + remain;
> if (endIndex > text.length()) {
> endIndex = text.length();
> }
>
> newText.append(encoder.encodeText(text.substring(lastEndOffset,
> endIndex)));
> }
> }
> else
> {
>
> newText.append(encoder.encodeText(text.substring(lastEndOffset)));
> }
> }
> currentFrag.textEndPos = newText.length();
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]