Re: Question on stemming + synonyms and tokenizerFactory

Brian Fri, 14 Nov 2014 22:29:48 -0800

Once you have your mapping set up, then create an application that itself 
constructs the analyzer you need. Then feed it your real words and let it 
generate the stemmed versions.


I don't think that ES can be told to do this; but it provides the classes 
you need to do it yourself.

For my own synonym processing, I do a Very Bad Thing. I create a synonym 
_type and then each document contains a list of words or phrases that are 
synonyms of each other. For a synonym query, I first query my synonym type. 
Then I OR the queries for each of the matching synonym words or phrases.

This is also much easier to maintain: I can update the synonyms on the fly 
and do not need to reindex the data at all. Not at all.

But it requires additional code, and it works best using the Java API. And 
some folks have indicated there are serious performance issues making this 
a Bad Solution. But I have not seen any problems with performance.

Oh, and all my words and phrases can be fully spelled out; it's only when 
they are used in the subsequent query do they get analyzed (tokenized, 
stemmed, and whatever else).

Brian

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e5a984d2-4f30-4e78-b1ba-1dc27febdfd3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Question on stemming + synonyms and tokenizerFactory

Reply via email to