I have an analysis chain like this for some Spanish text:
standard asciifolding lowercase es_stop_filter es_stem_filter es_synonyms

With synonyms at the end, after all the other filters, I have to define my 
synonyms in their stemmed, ASCII-folded, lowercase forms. So instead of 
defining a synonym set like "vacuna, vacunación, inmunización", I have to 
define it as "vacun, vacunacion, inmunizacion".

In the case of a very aggressive stemmer like Snowball for English, we 
would have to define "intern, global" as a synonym mapping when we'd really 
want to write "international, global". 

This is a little counter-intuitive for the folks who define our synonyms, 
as they think in dictionary terms and not stemmed tokens, and need to have 
access to a "standard asciifolding lowercase es_stop_filter es_stem_filter" 
analysis chain to apply everything but the synonym filter in order to see 
what tokens to specify in the synonyms file.

In this blog post 
<http://www.igate.com/iblog/index.php/stemming-and-synonyms-in-apache-solr/> 
about 
Solr, the author mentions that one could define a "custom tokenizer that 
returns the stemmed form of words from the synonyms file" to get around 
this. Is it possible to configure Elasticsearch this way?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a7009182-9577-4580-872a-1b121be3457d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to