The examples I've seen so far are single words. But I learned today
something new .. the number of "synonyms" returned for a word may be in the
range of hundreds, sometimes even thousands.
So I'm not sure query-time synonyms may work at all .. what do you think?

Shai


On Thu, Jul 18, 2013 at 3:21 PM, Jack Krupansky <j...@basetechnology.com>wrote:

>   Your best bet is to preprocess queries and expand synonyms in your own
> application layer. The Lucene/Solr synonym implementation, design, and
> architecture is fairly lightweight (although FST is a big improvement) and
> not architected for large and dynamic synonym sets.
>
> Do you need multi-word phrase synonyms as well, or is this strictly
> single-word synonyms?
>
> -- Jack Krupansky
>
>  *From:* Shai Erera <ser...@gmail.com>
> *Sent:* Thursday, July 18, 2013 1:36 AM
> *To:* dev@lucene.apache.org
> *Subject:* Programmatic Synonyms Filter (Lucene and/or Solr)
>
>     Hi
>
> I was asked to integrate with a system which provides synonyms for words
> through API. I checked the existing synonym filters in Lucene and Solr and
> they all seem to take a synonyms map up front.
>
> E.g. Lucene's SynonymFilter takes a SynonymMap which exposes an FST, so
> it's not really programmatic in the sense that I can provide an impl which
> will pull the synonyms through the other system's API.
>
> Solr SynonymFilterFactory just loads the synonyms from a file into a
> SynonymMap, and then uses Lucene's SynonymFilter, so it doesn't look like I
> can extend that one either.
>
> The problem is that the synonyms DB I should integrate with is HUGE and
> will probably not fit in RAM (SynonymMap). Nor is it currently possible to
> pull all available synonyms from it in one go. The API I have is something
> like String[] getSynonyms(String word).
>
> So I have few questions:
>
> 1) Did I miss a Filter which does take a programmatic syn-map which I can
> provide my own impl to?
>
> 2) If not, Would it make sense to modify SynonymMap to offer
> getSynonyms(word) API (using BytesRef / CharsRef of course), with an
> FSTSynonymMap default impl so that users can provide their own impl, e.g.
> not requiring everything to be in RAM?
>
> 2.1) Side-effect benefit, I think, is that we won't require everyone to
> deal with the FST API that way, though I'll admit I cannot think of may use
> cases for not using SynonymFilter as-is ...
>
> 3) If the answer to (1) and (2) is NO, I guess my only option is to
> implement my own SynonymFilter, copying most of the code from Lucene's ...
> right?
>
> Shai
>

Reply via email to