The examples I've seen so far are single words. But I learned today something new .. the number of "synonyms" returned for a word may be in the range of hundreds, sometimes even thousands. So I'm not sure query-time synonyms may work at all .. what do you think?
Shai On Thu, Jul 18, 2013 at 3:21 PM, Jack Krupansky <j...@basetechnology.com>wrote: > Your best bet is to preprocess queries and expand synonyms in your own > application layer. The Lucene/Solr synonym implementation, design, and > architecture is fairly lightweight (although FST is a big improvement) and > not architected for large and dynamic synonym sets. > > Do you need multi-word phrase synonyms as well, or is this strictly > single-word synonyms? > > -- Jack Krupansky > > *From:* Shai Erera <ser...@gmail.com> > *Sent:* Thursday, July 18, 2013 1:36 AM > *To:* dev@lucene.apache.org > *Subject:* Programmatic Synonyms Filter (Lucene and/or Solr) > > Hi > > I was asked to integrate with a system which provides synonyms for words > through API. I checked the existing synonym filters in Lucene and Solr and > they all seem to take a synonyms map up front. > > E.g. Lucene's SynonymFilter takes a SynonymMap which exposes an FST, so > it's not really programmatic in the sense that I can provide an impl which > will pull the synonyms through the other system's API. > > Solr SynonymFilterFactory just loads the synonyms from a file into a > SynonymMap, and then uses Lucene's SynonymFilter, so it doesn't look like I > can extend that one either. > > The problem is that the synonyms DB I should integrate with is HUGE and > will probably not fit in RAM (SynonymMap). Nor is it currently possible to > pull all available synonyms from it in one go. The API I have is something > like String[] getSynonyms(String word). > > So I have few questions: > > 1) Did I miss a Filter which does take a programmatic syn-map which I can > provide my own impl to? > > 2) If not, Would it make sense to modify SynonymMap to offer > getSynonyms(word) API (using BytesRef / CharsRef of course), with an > FSTSynonymMap default impl so that users can provide their own impl, e.g. > not requiring everything to be in RAM? > > 2.1) Side-effect benefit, I think, is that we won't require everyone to > deal with the FST API that way, though I'll admit I cannot think of may use > cases for not using SynonymFilter as-is ... > > 3) If the answer to (1) and (2) is NO, I guess my only option is to > implement my own SynonymFilter, copying most of the code from Lucene's ... > right? > > Shai >