Re: AW: Transforming german umlaute like ö,ä ,ü,ß into oe, ae, ue, ss
Where do I get the CharFilter library? I'm using Lucene, not Solr. Thanks, Sascha CharFilter is included in recent Solr nightly build. It is not OOTB solution for Lucene now, sorry. If I have time, I will make it for Lucene in this weekend. Now the patch available for Lucene at: https://issues.apache.org/jira/browse/LUCENE-1466 Koji - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
AW: Transforming german umlaute like ö,ä,ü ,ß into oe, ae, ue, ss
Use ISOLatin1AccentFilter, although it is not perfect... So I made ISOLatin2AccentFilter for me and changed this method. We use our own analysers, so you would use something like this result = new org.apache.lucene.analysis.WhitespaceTokenizer(reader); result = new ISOLatin2AccentFilter(result); result = new org.apache.lucene.analysis.LowerCaseFilter(result); * To replace accented characters in a String by unaccented equivalents. */ public final static String removeAccents(String input) { final StringBuffer output = new StringBuffer(); for (int i = 0; i input.length(); i++) { switch (input.charAt(i)) { case '\u00C0' : // À case '\u00C1' : // Á case '\u00C2' : //  case '\u00C3' : // à case '\u00C5' : // Å output.append(A); break; case '\u00C4' : // Ä case '\u00C6' : // Æ output.append(AE); break; case '\u00C7' : // Ç output.append(C); break; case '\u00C8' : // È case '\u00C9' : // É case '\u00CA' : // Ê case '\u00CB' : // Ë output.append(E); break; case '\u00CC' : // Ì case '\u00CD' : // Í case '\u00CE' : // Î case '\u00CF' : // Ï output.append(I); break; case '\u00D0' : // Ð output.append(D); break; case '\u00D1' : // Ñ output.append(N); break; case '\u00D2' : // Ò case '\u00D3' : // Ó case '\u00D4' : // Ô case '\u00D5' : // Õ case '\u00D8' : // Ø output.append(O); break; case '\u00D6' : // Ö case '\u0152' : // Œ output.append(OE); break; case '\u00DE' : // Þ output.append(TH); break; case '\u00D9' : // Ù case '\u00DA' : // Ú case '\u00DB' : // Û output.append(U); break; case '\u00DC' : // Ü output.append(UE); break; case '\u00DD' : // Ý case '\u0178' : // Ÿ output.append(Y); break; case '\u00E0' : // à case '\u00E1' : // á case '\u00E2' : // â case '\u00E3' : // ã case '\u00E5' : // å output.append(a); break; case '\u00E4' : // ä case '\u00E6' : // æ output.append(ae); break; case '\u00E7' : // ç output.append(c); break; case '\u00E8' : // è case '\u00E9' : // é case '\u00EA' : // ê case '\u00EB' : // ë output.append(e); break; case '\u00EC' : // ì case '\u00ED' : // í
Re: AW: Transforming german umlaute like ö,ä,ü,ß into oe, ae, ue, ss
Uwe Goetzke wrote: Use ISOLatin1AccentFilter, although it is not perfect... So I made ISOLatin2AccentFilter for me and changed this method. Or use CharFilter library. It is for Solr as of now, though. See: https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG https://issues.apache.org/jira/browse/SOLR-822 Koji - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: AW: Transforming german umlaute like ö,ä, ü,ß into oe, ae, ue, ss
Where do I get the CharFilter library? I'm using Lucene, not Solr. Thanks, Sascha Am 18.11.2008 um 14:11 schrieb Koji Sekiguchi: Uwe Goetzke wrote: Use ISOLatin1AccentFilter, although it is not perfect... So I made ISOLatin2AccentFilter for me and changed this method. Or use CharFilter library. It is for Solr as of now, though. See: https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG https://issues.apache.org/jira/browse/SOLR-822 Koji - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Sascha Fahl Softwareentwicklung evenity GmbH Zu den Mühlen 19 D-35390 Gießen Mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: AW: Transforming german umlaute like ö,ä ,ü,ß into oe, ae, ue, ss
Sascha Fahl wrote: Where do I get the CharFilter library? I'm using Lucene, not Solr. Thanks, Sascha CharFilter is included in recent Solr nightly build. It is not OOTB solution for Lucene now, sorry. If I have time, I will make it for Lucene in this weekend. Koji - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]