I would not say "Diabolical". Perhaps not optimal based on Lucene's 
internal design.

But I do something similar with table-based synonyms. In other words, when 
matching a synonym of a word, I do not pre-build the database index with 
synonyms. Instead, I maintain a table (index/type) of words and their 
synonyms, query that table, retrieve the synonyms, and then create the 
second and final query that basically does an OR search across the word and 
its synonyms. (It's basically a group of should clauses, just like yours).

I find that performance is fine. And accuracy and usefulness is superior. 
For example, a user query for synonym of the wild-carded BIG* might find 
BIG, LARGE, HUGE and also BIGHORN, SHEEP. And so on; some of the synonym 
lists are rather long and with multiple words there are many should terms 
in the final query.

And even with the multiple queries (first to resolve the synonyms, and the 
second to OR across them), performance is remarkably fast. It might be 
pushing Lucene a little, but I like the improved accuracy, and the ability 
to easily and regularly modify my synonym lists without any need to rebuild 
the hundreds of millions of documents that I am querying.

So for your question, my suggestion is to go for it and it should perform 
well enough.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4d6ba249-c8b6-4870-af96-ed71ee1b2f7e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to