[ https://issues.apache.org/jira/browse/LUCENE-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784480#action_12784480 ]
Robert Muir commented on LUCENE-2102: ------------------------------------- bq. The patch's TurkishLowerCaseFilter is as unflexible as that. The idea is just a replacement for the current patch (and it is even a little bit more universal, because you can change the chars to map). Uwe this is not true. With a tokenfilter, I can use Version that will apply the logic i mentioned above: bq. after finding a regular I (\u0049) we could search ahead for COMBINING DOT ABOVE (ignoring any nonspacing marks and format and such along the way), and handle this differently. you cannot do this with mappingchar filter, or rather, you could, but there would be millions of mappings for this one character. I could later patch this filter with Version and some lookahead based on unicode properties if i wanted to improve it. > LowerCaseFilter for Turkish language > ------------------------------------ > > Key: LUCENE-2102 > URL: https://issues.apache.org/jira/browse/LUCENE-2102 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Affects Versions: 3.0 > Reporter: Ahmet Arslan > Assignee: Robert Muir > Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2102.patch, LUCENE-2102.patch, LUCENE-2102.patch > > > java.lang.Character.toLowerCase() converts 'I' to 'i' however in Turkish > alphabet lowercase of 'I' is not 'i'. It is LATIN SMALL LETTER DOTLESS I. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org