[ 
https://issues.apache.org/jira/browse/LUCENE-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784907#action_12784907
 ] 

Simon Willnauer commented on LUCENE-2102:
-----------------------------------------

Robert, I see your point. The root cause why this bugs me is that this 
TokenFilter changes his behavior (at least if you index deseret with this 
analyzer) :) depending on the passed version. I don't think that new code 
should try do anything based on version. The Version ctor it totally ok for me 
in this case but we should really use Integer.CodePointAt() instead of 
CharacterUtil. Once I think about the mess ensureOnOrAfter would create 
throughout all the code I doubt it would to any good in the end.



> LowerCaseFilter for Turkish language
> ------------------------------------
>
>                 Key: LUCENE-2102
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2102
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>    Affects Versions: 3.0
>            Reporter: Ahmet Arslan
>            Assignee: Robert Muir
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-2102.patch, LUCENE-2102.patch, LUCENE-2102.patch, 
> LUCENE-2102.patch, LUCENE-2102.patch, LUCENE-2102.patch
>
>
> java.lang.Character.toLowerCase() converts 'I' to 'i' however in Turkish 
> alphabet lowercase of 'I' is not 'i'. It is LATIN SMALL LETTER DOTLESS I.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to