[jira] Commented: (LUCENE-1216) CharDelimiterTokenizer

Otis Gospodnetic (JIRA) Thu, 15 May 2008 09:16:19 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597178#action_12597178
 ]


Otis Gospodnetic commented on LUCENE-1216:
------------------------------------------

Aha, that makes sense - thanks for clarifying.  I think I'm not the only one 
who won't immediately realize that setWhitespaceDelimiter delimits on all 
isWhitespace characters, so it would be good to add that to the javadoc.

Could you please do that and upload the new class + its unit test class as a 
patch?

Thanks!


> CharDelimiterTokenizer
> ----------------------
>
>                 Key: LUCENE-1216
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1216
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Hiroaki Kawai
>            Assignee: Otis Gospodnetic
>            Priority: Minor
>         Attachments: CharDelimiterTokenizer.java, 
> CharDelimiterTokenizer.java, TestCharDelimiterTokenizer.java
>
>
> WhitespaceTokenizer is very useful for space separated languages, but my 
> Japanese text is not always separated by a space. So, I created an 
> alternative Tokenizer that we can specify the delimiter. The file submitted 
> will be an improvement of the current WhitespaceTokenizer.
> I tried to extend it from CharTokenizer, but CharTokenizer has a limitation 
> that a token can't be longer than 255 chars.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1216) CharDelimiterTokenizer

Reply via email to