Re: Folding of accented to non-accented only — leaving symbols

Alexandre Rafalovitch Mon, 13 Oct 2014 16:30:06 -0700

You are probably looking for ICU Folding which is part of ICU plugin:
https://github.com/elasticsearch/elasticsearch-analysis-icu . It's not
explained in details on that page, but you can see a long list of
normalizations from the Lucene's Javadoc:
http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/icu/ICUFoldingFilter.html


Overall, the explanation language is a little hairy and you may need
to chase through the Unicode pages, but it should be the
production-ready approach in the end.

Regards,
   Alex.

On 13 October 2014 15:30, Lee Gee <[email protected]> wrote:
> I now the asciifolding filter docs are really very clear on this, but it
> took me an embarrassingly long time to realise I was losing my currency
> symbol (£) to the ASCII folding filter.
>
> Other than creating my own character map with the char map filter, does
> there exist something of production quality that would translate accented
> UTF8 characters of the Latin-alphabet into non-accented characters in the
> ASCII range?
>
> TIA
> Lee
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/ff95c6ec-7907-454e-bd58-774ee173f4e3%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEFAe-H-pePOqU6t4B0uD6iyeBdQ%3Dd6Wh498HJgv-M3W4crJsQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Folding of accented to non-accented *only* — leaving symbols

Reply via email to

Re: Folding of accented to non-accented only — leaving symbols