[ https://issues.apache.org/jira/browse/LUCENE-973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634363#action_12634363 ]
Toru Matsuzawa commented on LUCENE-973: --------------------------------------- Thank you for Sekiguchi-san and Steven comment. I am sorry for slow comment . {quote} The following part of your patch appears to address a problem that you haven't covered in your comments - is this so? If it is a problem separate from the empty-string issue, can you describe the effects of this change?: {quote} In current CJKTokenizer, "C3" becomes "Single" of non-ascii as shown by the following examples. {noformat} // C1C2C3 is non-ascii String str = "C1C2abcC3def" ; Tokenizer tokenizer = new CJKTokenizer( new StringReader( str ) ); for( Token token = tokenizer.next(); token != null; token = tokenizer.next() ) System.out.println( "token=\"" + token.termText() + "\"" + " type=\""+ token.type() + "\""); {noformat} current CJKTokenizer outputs: {noformat} token="C1C2" type="double" token="" type="single" token="abc" type="single" token="C3" type="single" token="def" type="single" {noformat} applying patch: {noformat} token="C1C2" type="double" token="C2" type="double" token="abc" type="single" token="C3" type="double" token="def" type="single" {noformat} {quote} Wouldn't it be simpler/clearer to test length for zero instead of constructing a String and testing it for equality with the empty string?: {quote} I think that your correction is better. > Token of "" returns in CJK > --------------------------- > > Key: LUCENE-973 > URL: https://issues.apache.org/jira/browse/LUCENE-973 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis > Affects Versions: 2.3 > Reporter: Toru Matsuzawa > Attachments: CJKTokenizer20070807.patch, with-patch.jpg, > without-patch.jpg > > > The "" string returns as Token in the boundary of two byte character and one > byte character. > There is no problem in CJKAnalyzer. > When CJKTokenizer is used with the unit, it becomes a problem. (Use it with > Solr etc.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]