Hi Jürgen, I'm aware that mapping umlauts gets many false positives, but we have noticed that some of our users omit them while searching. I guess we'll have to make product decision there because we can not cover all use cases anyway.
Thanks for your response! Best, Kresimir On Saturday, November 29, 2014 6:41:17 PM UTC+1, Jürgen Wagner (DVT) wrote: > > Hello Kresimir, > as a native speaker of German and a linguist, I know you usually want to > preserve the umlaut, but for searches you may want to relax the precision > of matching. So, why not do precisely this? If you have "über" or "ueber" > in your query, replace it by "über OR ueber". And if you want to take care > of those Americans who believe these two dots do not carry any meaning at > all (heavy grin at this point), you may add even "OR uber". Syntactically, > "uber" is wrong. This would only be a convenience rule for users thinking > they can simply omit umlaut dots or who are incapable of typing umlaut > characters on their keyboards. > > Note: when it comes to German last names, the names Ganser, Gänser and > Gaenser would be considered three entirely different names, although the > alternative spelling (e.g., in plain e-mail addresses) of Gänser could be > Gaenser. Mapping umlauts will get you false positives. > > Also be careful with the reverse. "ue", "oe" and "ae" cannot simply be > spelled as "ü", "ö" or "ä". In a word like "Zooeingang" (zoo entrance), the > composite is actually made of "Zoo" and "Eingang", so the "oe" must not be > interpreted as "ö". > > Similar issues exist with "ß" and "ss". > > Well, most likely these funny cases won't matter too much, so I suggest to > try with a simple disjunctive expansion for a start. > > Best regards, > --Jürgen > > On Tue, Nov 18, 2014 at 12:30 PM, Krešimir Slugan <[email protected] > <javascript:>> wrote: > >> Hi, >> >> To handle German language in search I have to be able to provide same >> results if user searches for e.g über, uber or ueber >> >> I would do this at the index time where I would have über in the data. But >> if I use just asciifolding filter I lose information that this was work >> with "umlaut" and I can't get ueber token. If I use char_fiter, it is >> applied before analysis and I would not be able to get uber. >> >> Is it possible to preserve original with char filter or apply it after >> the analysis? >> >> Cheers, >> >> Kresimir >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/f18f94bc-58e0-4bbf-a445-b45ba4db11f3%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/f18f94bc-58e0-4bbf-a445-b45ba4db11f3%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsUPgHpwYwruOc%3DLhhrb2JnEG5CWS5O4Nuj52vnty9yPA%40mail.gmail.com > > <https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsUPgHpwYwruOc%3DLhhrb2JnEG5CWS5O4Nuj52vnty9yPA%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > > > > -- > > Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С > уважением > *i.A. Jürgen Wagner* > Head of Competence Center "Intelligence" > & Senior Cloud Consultant > > Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany > Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543 > E-Mail: [email protected] <javascript:>, URL: www.devoteam.de > ------------------------------ > Managing Board: Jürgen Hatzipantelis (CEO) > Address of Record: 64331 Weiterstadt, Germany; Commercial Register: > Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071 > > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ec79cc5f-a6e1-4fc4-8f60-7f1ab31b60ad%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
