I would not say "Diabolical". Perhaps not optimal based on Lucene's internal design.
But I do something similar with table-based synonyms. In other words, when matching a synonym of a word, I do not pre-build the database index with synonyms. Instead, I maintain a table (index/type) of words and their synonyms, query that table, retrieve the synonyms, and then create the second and final query that basically does an OR search across the word and its synonyms. (It's basically a group of should clauses, just like yours). I find that performance is fine. And accuracy and usefulness is superior. For example, a user query for synonym of the wild-carded BIG* might find BIG, LARGE, HUGE and also BIGHORN, SHEEP. And so on; some of the synonym lists are rather long and with multiple words there are many should terms in the final query. And even with the multiple queries (first to resolve the synonyms, and the second to OR across them), performance is remarkably fast. It might be pushing Lucene a little, but I like the improved accuracy, and the ability to easily and regularly modify my synonym lists without any need to rebuild the hundreds of millions of documents that I am querying. So for your question, my suggestion is to go for it and it should perform well enough. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4d6ba249-c8b6-4870-af96-ed71ee1b2f7e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
