[
https://issues.apache.org/jira/browse/LUCENE-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784483#action_12784483
]
Uwe Schindler edited comment on LUCENE-2102 at 12/1/09 10:36 PM:
-----------------------------------------------------------------
if I replace this code from Ahmet's test
{code}
public class TestTurkishLowerCaseFilter extends BaseTokenStreamTestCase {
public void testTurkishLowerCaseFilter() throws Exception {
TokenStream stream = new WhitespaceTokenizer(
new StringReader("\u0130STANBUL \u0130ZM\u0130R ISPARTA"));
TokenStream filter = new TurkishLowerCaseFilter(Version.LUCENE_30, stream);
assertTokenStreamContents(filter, new String[] {"istanbul", "izmir",
"\u0131sparta",});
}
}
{code}
by that, there is not even a new class or anything needed:
{code}
public class TestTurkishLowerCaseFilter extends BaseTokenStreamTestCase {
static final NormalizeCharMap map = new NormalizeCharMap();
static {
map.add("\u0049", "0x0131");
}
public void testTurkishLowerCaseFilter() throws Exception {
TokenStream stream = new WhitespaceTokenizer(
new MappingCharFilter(map,
new StringReader("\u0130STANBUL \u0130ZM\u0130R ISPARTA")));
TokenStream filter = new LowerCaseFilter(Version.LUCENE_30, stream);
assertTokenStreamContents(filter, new String[] {"istanbul", "izmir",
"\u0131sparta",});
}
}
{code}
It just works.
was (Author: thetaphi):
if I replace this code from Ahmet's test
{code}
public class TestTurkishLowerCaseFilter extends BaseTokenStreamTestCase {
public void testTurkishLowerCaseFilter() throws Exception {
TokenStream stream = new WhitespaceTokenizer(
new StringReader("\u0130STANBUL \u0130ZM\u0130R ISPARTA"));
LowerCaseFilter filter = new TurkishLowerCaseFilter(Version.LUCENE_30,
stream);
assertTokenStreamContents(filter, new String[] {"istanbul", "izmir",
"\u0131sparta",});
}
}
{code}
by that, there is not even a new class or anything needed:
{code}
public class TestTurkishLowerCaseFilter extends BaseTokenStreamTestCase {
static final NormalizeCharMap map = new NormalizeCharMap();
static {
map.add("\u0049", "0x0131");
}
public void testTurkishLowerCaseFilter() throws Exception {
TokenStream stream = new WhitespaceTokenizer(
new MappingCharFilter(map,
new StringReader("\u0130STANBUL \u0130ZM\u0130R ISPARTA")));
TurkishLowerCaseFilter filter = new LowerCaseFilter(Version.LUCENE_30,
stream);
assertTokenStreamContents(filter, new String[] {"istanbul", "izmir",
"\u0131sparta",});
}
}
{code}
It just works.
> LowerCaseFilter for Turkish language
> ------------------------------------
>
> Key: LUCENE-2102
> URL: https://issues.apache.org/jira/browse/LUCENE-2102
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Analysis
> Affects Versions: 3.0
> Reporter: Ahmet Arslan
> Assignee: Robert Muir
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2102.patch, LUCENE-2102.patch, LUCENE-2102.patch
>
>
> java.lang.Character.toLowerCase() converts 'I' to 'i' however in Turkish
> alphabet lowercase of 'I' is not 'i'. It is LATIN SMALL LETTER DOTLESS I.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]