I have an analysis chain like this for some Spanish text: standard asciifolding lowercase es_stop_filter es_stem_filter es_synonyms
With synonyms at the end, after all the other filters, I have to define my synonyms in their stemmed, ASCII-folded, lowercase forms. So instead of defining a synonym set like "vacuna, vacunación, inmunización", I have to define it as "vacun, vacunacion, inmunizacion". In the case of a very aggressive stemmer like Snowball for English, we would have to define "intern, global" as a synonym mapping when we'd really want to write "international, global". This is a little counter-intuitive for the folks who define our synonyms, as they think in dictionary terms and not stemmed tokens, and need to have access to a "standard asciifolding lowercase es_stop_filter es_stem_filter" analysis chain to apply everything but the synonym filter in order to see what tokens to specify in the synonyms file. In this blog post <http://www.igate.com/iblog/index.php/stemming-and-synonyms-in-apache-solr/> about Solr, the author mentions that one could define a "custom tokenizer that returns the stemmed form of words from the synonyms file" to get around this. Is it possible to configure Elasticsearch this way? -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a7009182-9577-4580-872a-1b121be3457d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
