Re: umlauts / diacritic expansion

Michael Sokolov Wed, 17 Apr 2019 05:42:05 -0700

Right, AsciiFoldingFilter seems to map  Ü  [LATIN CAPITAL LETTER U
WITH DIAERESIS] to "U" not "UE".


On Wed, Apr 17, 2019 at 12:26 AM Ralf Heyde <[email protected]> wrote:
>
> Ah sorry, Asciifolding for umlauts will result in ue/ae - ß/ss etc
>
> You could allow a distance of 1 or 2 given you use levenshtein distance - 
> this might be close to what you need.
>
> Von meinem iPhone gesendet
>
> > Am 16.04.2019 um 20:08 schrieb Michael Sokolov <[email protected]>:
> >
> > I'm learning how to index/search German today and understanding that
> > vowels with umlauts are conventionally expanded into two ASCII
> > characters, eg  "für" -> "fuer", so people may search for the expanded
> > form "fuer", but they might also search with the diacritic, and
> > finally they might lazily search using the stripped form "fur".
> >
> > My question: is there a standard CharFilter or TokenFilter that
> > expands to both (ASCII) forms, for characters with umlauts and perhaps
> > other diacritics I might be unaware of in other languages having
> > similar multiple renderings in ASCII?
> >
> > -Mike
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: umlauts / diacritic expansion

Reply via email to