Your best bet is to preprocess queries and expand synonyms in your own 
application layer. The Lucene/Solr synonym implementation, design, and 
architecture is fairly lightweight (although FST is a big improvement) and not 
architected for large and dynamic synonym sets.

Do you need multi-word phrase synonyms as well, or is this strictly single-word 
synonyms?

-- Jack Krupansky

From: Shai Erera 
Sent: Thursday, July 18, 2013 1:36 AM
To: dev@lucene.apache.org 
Subject: Programmatic Synonyms Filter (Lucene and/or Solr)

Hi


I was asked to integrate with a system which provides synonyms for words 
through API. I checked the existing synonym filters in Lucene and Solr and they 
all seem to take a synonyms map up front. 

E.g. Lucene's SynonymFilter takes a SynonymMap which exposes an FST, so it's 
not really programmatic in the sense that I can provide an impl which will pull 
the synonyms through the other system's API.


Solr SynonymFilterFactory just loads the synonyms from a file into a 
SynonymMap, and then uses Lucene's SynonymFilter, so it doesn't look like I can 
extend that one either.


The problem is that the synonyms DB I should integrate with is HUGE and will 
probably not fit in RAM (SynonymMap). Nor is it currently possible to pull all 
available synonyms from it in one go. The API I have is something like String[] 
getSynonyms(String word).


So I have few questions:


1) Did I miss a Filter which does take a programmatic syn-map which I can 
provide my own impl to?


2) If not, Would it make sense to modify SynonymMap to offer getSynonyms(word) 
API (using BytesRef / CharsRef of course), with an FSTSynonymMap default impl 
so that users can provide their own impl, e.g. not requiring everything to be 
in RAM?


2.1) Side-effect benefit, I think, is that we won't require everyone to deal 
with the FST API that way, though I'll admit I cannot think of may use cases 
for not using SynonymFilter as-is ...


3) If the answer to (1) and (2) is NO, I guess my only option is to implement 
my own SynonymFilter, copying most of the code from Lucene's ... right?

Shai

Reply via email to