[ https://issues.apache.org/jira/browse/LUCENE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated LUCENE-2219: -------------------------------- Attachment: LUCENE-2219.patch this fixes contrib too, as long as you apply the CJKTokenizer fix from LUCENE-2207. end() was incorrect for ChineseTokenizer, SmartChinese, and Wikipedia > improve BaseTokenStreamTestCase to test end() > --------------------------------------------- > > Key: LUCENE-2219 > URL: https://issues.apache.org/jira/browse/LUCENE-2219 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis, contrib/analyzers > Affects Versions: 3.0 > Reporter: Robert Muir > Attachments: LUCENE-2219.patch, LUCENE-2219.patch > > > If offsetAtt/end() is not implemented correctly, then there can be problems > with highlighting: see LUCENE-2207 for an example with CJKTokenizer. > In my opinion you currently have to write too much code to test this. > This patch does the following: > * adds optional Integer finalOffset (can be null for no checking) to > assertTokenStreamContents > * in assertAnalyzesTo, automatically fill this with the String length() > In my opinion this is correct, for assertTokenStreamContents the behavior > should be optional, it may not even have a Tokenizer. If you are using > assertTokenStreamContents with a Tokenizer then simply provide the extra > expected value to check it. > for assertAnalyzesTo then it is implied there is a tokenizer so it should be > checked. > the tests pass for core but there are failures in contrib even besides > CJKTokenizer (apply Koji's patch from LUCENE-2207, it is correct). > Specifically ChineseTokenizer has a similar problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org