[ https://issues.apache.org/jira/browse/LUCENE-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Duffy updated LUCENE-1389: --------------------------------- Attachment: positions.patch I've attached another diff, again from the trunk version. There is a slight optimisation - the span loop is broken early when a span is found at the current position. The main change is to start(String), though. Previously, it set currentPosition to 0, meaning every position was off by one and spans were not matched. It now starts currentPosition at -1 so the first token position ends up 0 as it should. > SimpleSpanFragmenter can create very short fragments > ---------------------------------------------------- > > Key: LUCENE-1389 > URL: https://issues.apache.org/jira/browse/LUCENE-1389 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/highlighter > Affects Versions: 2.3.2 > Reporter: Andrew Duffy > Priority: Minor > Attachments: positions.patch, tailfragments.patch > > > Line 74 of SimpleSpanFragmenter returns true when the current token is the > start of a hit on a span or phrase, thus starting a new fragment. Two > problems occur: > - The previous fragment may be very short, but if it contains a hit it will > be combined with the new fragment later so this disappears. > - If the token is close to a natural fragment boundary the new fragment will > end up very short; possibly even as short as just the span or phrase itself. > This is the result of creating a new fragment without incrementing > currentNumFrags. > To fix, remove or comment out line 74. The result is that fragments average > to the fragment size unless a span or phrase hit is towards the end of the > fragment - that fragment is made larger and the following fragment shorter to > accommodate the hit. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]