[ https://issues.apache.org/jira/browse/LUCENE-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784472#action_12784472 ]
Uwe Schindler commented on LUCENE-2102: --------------------------------------- One othe possibility to resolve the problem in a completely different way: You could wrap a MappingCharFilter on top of the input reader in Analyzer and just add a replacement for this one char: [http://lucene.apache.org/java/3_0_0/api/all/org/apache/lucene/analysis/MappingCharFilter.html] This would be a very easy fix without code duplication. You just change the input before tokenization. And its already in Lucene core, just plug it into the analyzer's tokenStream() or reusableTokenStream() method as a wrapper around the Reader param. This would be very easy also for the other analyzers having problem with seldom chars. It can also be used to remove chars at all or replace them by longer sequences like รค -> ae (for german). > LowerCaseFilter for Turkish language > ------------------------------------ > > Key: LUCENE-2102 > URL: https://issues.apache.org/jira/browse/LUCENE-2102 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Affects Versions: 3.0 > Reporter: Ahmet Arslan > Assignee: Robert Muir > Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2102.patch, LUCENE-2102.patch, LUCENE-2102.patch > > > java.lang.Character.toLowerCase() converts 'I' to 'i' however in Turkish > alphabet lowercase of 'I' is not 'i'. It is LATIN SMALL LETTER DOTLESS I. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org