Use ISOLatin1AccentFilter, although it is not perfect... So I made ISOLatin2AccentFilter for me and changed this method. We use our own analysers, so you would use something like this
result = new
org.apache.lucene.analysis.WhitespaceTokenizer(reader);
result = new ISOLatin2AccentFilter(result);
result = new org.apache.lucene.analysis.LowerCaseFilter(result);
* To replace accented characters in a String by unaccented equivalents.
*/
public final static String removeAccents(String input) {
final StringBuffer output = new StringBuffer();
for (int i = 0; i < input.length(); i++) {
switch (input.charAt(i)) {
case '\u00C0' : // À
case '\u00C1' : // Á
case '\u00C2' : // Â
case '\u00C3' : // Ã
case '\u00C5' : // Å
output.append("A");
break;
case '\u00C4' : // Ä
case '\u00C6' : // Æ
output.append("AE");
break;
case '\u00C7' : // Ç
output.append("C");
break;
case '\u00C8' : // È
case '\u00C9' : // É
case '\u00CA' : // Ê
case '\u00CB' : // Ë
output.append("E");
break;
case '\u00CC' : // Ì
case '\u00CD' : // Í
case '\u00CE' : // Î
case '\u00CF' : // Ï
output.append("I");
break;
case '\u00D0' : // Ð
output.append("D");
break;
case '\u00D1' : // Ñ
output.append("N");
break;
case '\u00D2' : // Ò
case '\u00D3' : // Ó
case '\u00D4' : // Ô
case '\u00D5' : // Õ
case '\u00D8' : // Ø
output.append("O");
break;
case '\u00D6' : // Ö
case '\u0152' : // Œ
output.append("OE");
break;
case '\u00DE' : // Þ
output.append("TH");
break;
case '\u00D9' : // Ù
case '\u00DA' : // Ú
case '\u00DB' : // Û
output.append("U");
break;
case '\u00DC' : // Ü
output.append("UE");
break;
case '\u00DD' : // Ý
case '\u0178' : // Ÿ
output.append("Y");
break;
case '\u00E0' : // à
case '\u00E1' : // á
case '\u00E2' : // â
case '\u00E3' : // ã
case '\u00E5' : // å
output.append("a");
break;
case '\u00E4' : // ä
case '\u00E6' : // æ
output.append("ae");
break;
case '\u00E7' : // ç
output.append("c");
break;
case '\u00E8' : // è
case '\u00E9' : // é
case '\u00EA' : // ê
case '\u00EB' : // ë
output.append("e");
break;
case '\u00EC' : // ì
case '\u00ED' : // í
case '\u00EE' : // î
case '\u00EF' : // ï
output.append("i");
break;
case '\u00F0' : // ð
output.append("d");
break;
case '\u00F1' : // ñ
output.append("n");
break;
case '\u00F2' : // ò
case '\u00F3' : // ó
case '\u00F4' : // ô
case '\u00F5' : // õ
case '\u00F8' : // ø
output.append("o");
break;
case '\u00F6' : // ö
case '\u0153' : // œ
output.append("oe");
break;
case '\u00DF' : // ß
output.append("ss");
break;
case '\u00FE' : // þ
output.append("th");
break;
case '\u00F9' : // ù
case '\u00FA' : // ú
case '\u00FB' : // û
output.append("u");
break;
case '\u00FC' : // ü
output.append("ue");
break;
case '\u00FD' : // ý
case '\u00FF' : // ÿ
output.append("y");
break;
default :
output.append(input.charAt(i));
break;
}
}
return output.toString();
}
}
Regards
Uwe Goetzke
Leiter Produktentwicklung
Healy Hudson GmbH
Procurement & Retail Solutions
-----Ursprüngliche Nachricht-----
Von: Sascha Fahl [mailto:[EMAIL PROTECTED]
Gesendet: Dienstag, 18. November 2008 13:07
An: [email protected]
Betreff: Transforming german umlaute like ö,ä,ü,ß into oe, ae, ue, ss
Hi,
what is the best to transform the german umlaute ö,ä,ü,ß into oe, ae,
ue, ss during the process of analyzing?
Thanks,
Sascha Fahl
Softwareentwicklung
evenity GmbH
Zu den Mühlen 19
D-35390 Gießen
Mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-----------------------------------------------------------------------
Healy Hudson GmbH - D-55252 Mainz Kastel
Geschäftsführer Christian Konhäuser - Amtsgericht Wiesbaden HRB 12076
Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte Empfänger sind,
dürfen Sie die Informationen nicht offen legen oder benutzen. Wenn Sie diese
Email durch einen Fehler bekommen haben, teilen Sie uns dies bitte umgehend
mit, indem Sie diese Email an den Absender zurückschicken. Bitte löschen Sie
danach diese Email.
This email is confidential. If you are not the intended recipient, you must not
disclose or use this information contained in it. If you have received this
email in error please tell us immediately by return email and delete the
document.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
