Re: AW: Transforming german umlaute like ö,ä ,ü,ß into oe, ae, ue, ss

2008-11-23 Thread Koji Sekiguchi

  Where do I get the CharFilter library? I'm using Lucene, not Solr.
 
  Thanks,
  Sascha
 CharFilter is included in recent Solr nightly build.
 It is not OOTB solution for Lucene now, sorry.
 If I have time, I will make it for Lucene in this weekend.

Now the patch available for Lucene at:
https://issues.apache.org/jira/browse/LUCENE-1466

Koji


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



AW: Transforming german umlaute like ö,ä,ü ,ß into oe, ae, ue, ss

2008-11-18 Thread Uwe Goetzke
Use ISOLatin1AccentFilter, although it is not perfect...
So I made ISOLatin2AccentFilter for me and changed this method.
We use our own analysers, so you would use something like this

result = new 
org.apache.lucene.analysis.WhitespaceTokenizer(reader);
result = new ISOLatin2AccentFilter(result);
result = new org.apache.lucene.analysis.LowerCaseFilter(result);


* To replace accented characters in a String by unaccented equivalents.
 */
public final static String removeAccents(String input) {
final StringBuffer output = new StringBuffer();
for (int i = 0; i  input.length(); i++) {
switch (input.charAt(i)) {
case '\u00C0' : // À
case '\u00C1' : // Á
case '\u00C2' : // Â
case '\u00C3' : // Ã
case '\u00C5' : // Å
output.append(A);
break;
case '\u00C4' : // Ä
case '\u00C6' : // Æ
output.append(AE);
break;
case '\u00C7' : // Ç
output.append(C);
break;
case '\u00C8' : // È
case '\u00C9' : // É
case '\u00CA' : // Ê
case '\u00CB' : // Ë
output.append(E);
break;
case '\u00CC' : // Ì
case '\u00CD' : // Í
case '\u00CE' : // Î
case '\u00CF' : // Ï
output.append(I);
break;
case '\u00D0' : // Ð
output.append(D);
break;
case '\u00D1' : // Ñ
output.append(N);
break;
case '\u00D2' : // Ò
case '\u00D3' : // Ó
case '\u00D4' : // Ô
case '\u00D5' : // Õ
case '\u00D8' : // Ø
output.append(O);
break;
case '\u00D6' : // Ö
case '\u0152' : // Œ
output.append(OE);
break;
case '\u00DE' : // Þ
output.append(TH);
break;
case '\u00D9' : // Ù
case '\u00DA' : // Ú
case '\u00DB' : // Û
output.append(U);
break;
case '\u00DC' : // Ü
output.append(UE);
break;
case '\u00DD' : // Ý
case '\u0178' : // Ÿ
output.append(Y);
break;
case '\u00E0' : // à
case '\u00E1' : // á
case '\u00E2' : // â
case '\u00E3' : // ã
case '\u00E5' : // å
output.append(a);
break;
case '\u00E4' : // ä
case '\u00E6' : // æ
output.append(ae);
break;
case '\u00E7' : // ç
output.append(c);
break;
case '\u00E8' : // è
case '\u00E9' : // é
case '\u00EA' : // ê
case '\u00EB' : // ë
output.append(e);
break;
case '\u00EC' : // ì
case '\u00ED' : // í

Re: AW: Transforming german umlaute like ö,ä,ü,ß into oe, ae, ue, ss

2008-11-18 Thread Koji Sekiguchi

Uwe Goetzke wrote:
 Use ISOLatin1AccentFilter, although it is not perfect...
 So I made ISOLatin2AccentFilter for me and changed this method.

Or use CharFilter library. It is for Solr as of now, though.

See:
https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG
https://issues.apache.org/jira/browse/SOLR-822

Koji


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: AW: Transforming german umlaute like ö,ä, ü,ß into oe, ae, ue, ss

2008-11-18 Thread Sascha Fahl

Where do I get the CharFilter library? I'm using Lucene, not Solr.

Thanks,
Sascha

Am 18.11.2008 um 14:11 schrieb Koji Sekiguchi:


Uwe Goetzke wrote:
 Use ISOLatin1AccentFilter, although it is not perfect...
 So I made ISOLatin2AccentFilter for me and changed this method.

Or use CharFilter library. It is for Solr as of now, though.

See:
https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG
https://issues.apache.org/jira/browse/SOLR-822

Koji


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Sascha Fahl
Softwareentwicklung

evenity GmbH
Zu den Mühlen 19
D-35390 Gießen

Mail: [EMAIL PROTECTED]









-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: AW: Transforming german umlaute like ö,ä ,ü,ß into oe, ae, ue, ss

2008-11-18 Thread Koji Sekiguchi

Sascha Fahl wrote:

Where do I get the CharFilter library? I'm using Lucene, not Solr.

Thanks,
Sascha

CharFilter is included in recent Solr nightly build.
It is not OOTB solution for Lucene now, sorry.
If I have time, I will make it for Lucene in this weekend.

Koji



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]