Your best bet is to preprocess queries and expand synonyms in your own application layer. The Lucene/Solr synonym implementation, design, and architecture is fairly lightweight (although FST is a big improvement) and not architected for large and dynamic synonym sets.
Do you need multi-word phrase synonyms as well, or is this strictly single-word synonyms? -- Jack Krupansky From: Shai Erera Sent: Thursday, July 18, 2013 1:36 AM To: dev@lucene.apache.org Subject: Programmatic Synonyms Filter (Lucene and/or Solr) Hi I was asked to integrate with a system which provides synonyms for words through API. I checked the existing synonym filters in Lucene and Solr and they all seem to take a synonyms map up front. E.g. Lucene's SynonymFilter takes a SynonymMap which exposes an FST, so it's not really programmatic in the sense that I can provide an impl which will pull the synonyms through the other system's API. Solr SynonymFilterFactory just loads the synonyms from a file into a SynonymMap, and then uses Lucene's SynonymFilter, so it doesn't look like I can extend that one either. The problem is that the synonyms DB I should integrate with is HUGE and will probably not fit in RAM (SynonymMap). Nor is it currently possible to pull all available synonyms from it in one go. The API I have is something like String[] getSynonyms(String word). So I have few questions: 1) Did I miss a Filter which does take a programmatic syn-map which I can provide my own impl to? 2) If not, Would it make sense to modify SynonymMap to offer getSynonyms(word) API (using BytesRef / CharsRef of course), with an FSTSynonymMap default impl so that users can provide their own impl, e.g. not requiring everything to be in RAM? 2.1) Side-effect benefit, I think, is that we won't require everyone to deal with the FST API that way, though I'll admit I cannot think of may use cases for not using SynonymFilter as-is ... 3) If the answer to (1) and (2) is NO, I guess my only option is to implement my own SynonymFilter, copying most of the code from Lucene's ... right? Shai