Ah sorry, Asciifolding for umlauts will result in ue/ae - ß/ss etc You could allow a distance of 1 or 2 given you use levenshtein distance - this might be close to what you need.
Von meinem iPhone gesendet > Am 16.04.2019 um 20:08 schrieb Michael Sokolov <msoko...@gmail.com>: > > I'm learning how to index/search German today and understanding that > vowels with umlauts are conventionally expanded into two ASCII > characters, eg "für" -> "fuer", so people may search for the expanded > form "fuer", but they might also search with the diacritic, and > finally they might lazily search using the stripped form "fur". > > My question: is there a standard CharFilter or TokenFilter that > expands to both (ASCII) forms, for characters with umlauts and perhaps > other diacritics I might be unaware of in other languages having > similar multiple renderings in ASCII? > > -Mike > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org