[
https://issues.apache.org/jira/browse/LUCENE-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pierre Gossé updated LUCENE-2874:
---------------------------------
Attachment: LUCENE-2874.patch
I couldn't get coding convention for eclipse from the wiki, link seams leads to
an error
"You are not allowed to do AttachFile on this page. Login and try again."
Sorry for the many differences in diff, the changed part is on lines 251 and
152 of new file
> Highlighting overlapping tokens outputs doubled words
> -----------------------------------------------------
>
> Key: LUCENE-2874
> URL: https://issues.apache.org/jira/browse/LUCENE-2874
> Project: Lucene - Java
> Issue Type: Bug
> Reporter: Pierre Gossé
> Attachments: LUCENE-2874.patch
>
>
> If for the text "the fox did not jump" we generate following tokens :
> (the, 0,
> 0-3),({fox},0,0-7),(fox,1,4-7),(did,2,8-11),(not,3,12,15),(jump,4,16,18)
> If TermVector for field is stored WITH_OFFSETS and not
> WITH_POSITIONS_OFFSETS, highlighing would output
> "the<em>the fox</em> did not jump"
> I join a patch with 2 additive JUnit tests and a fix of TokenSources class
> where token ordering by offset did'nt manage well overlapping tokens.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]