[ 
https://issues.apache.org/jira/browse/LUCENE-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784480#action_12784480
 ] 

Robert Muir commented on LUCENE-2102:
-------------------------------------

bq. The patch's TurkishLowerCaseFilter is as unflexible as that. The idea is 
just a replacement for the current patch (and it is even a little bit more 
universal, because you can change the chars to map).

Uwe this is not true. With a tokenfilter, I can use Version that will apply the 
logic i mentioned above:
bq. after finding a regular I (\u0049) we could search ahead for COMBINING DOT 
ABOVE (ignoring any nonspacing marks and format and such along the way), and 
handle this differently.

you cannot do this with mappingchar filter, or rather, you could, but there 
would be millions of mappings for this one character. I could later patch this 
filter with Version and some lookahead based on unicode properties if i wanted 
to improve it.

> LowerCaseFilter for Turkish language
> ------------------------------------
>
>                 Key: LUCENE-2102
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2102
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>    Affects Versions: 3.0
>            Reporter: Ahmet Arslan
>            Assignee: Robert Muir
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-2102.patch, LUCENE-2102.patch, LUCENE-2102.patch
>
>
> java.lang.Character.toLowerCase() converts 'I' to 'i' however in Turkish 
> alphabet lowercase of 'I' is not 'i'. It is LATIN SMALL LETTER DOTLESS I.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to