Fixing query-time multi-word synonym issue

Otis Gospodnetic Tue, 22 Jan 2013 07:18:14 -0800

Hello,

I'm looking for some guidance around solving the infamous index-time vs.
query-time multi-word synonym problem.  Looking for help with understanding
the pieces and effort involved, and also being on a lookout for any
potential "man, it will take you forever, you'll have to do major Lucene
surgery" type of warnings.


I never looked deeply into this problem and my understanding is that
multi-word synonyms don't work at query-time because QueryParser(?) simply
breaks queries on spaces and thus makes it impossible for
SynonymTokenFilter (?) to "see" the non-broken-up token sequence and do
synonym expansion.

I think this is also documented on the Wiki.
Are there other pieces involved that I didn't mention, but should have?

The following are 3 different efforts I found:
https://issues.apache.org/jira/browse/LUCENE-4499
http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
http://www.ub.uni-bielefeld.de/~befehl/base/solr/eurovoc.html

Plus Jack's proposal:
http://search-lucene.com/m/Zkj0k15dDGP1

Does any of the above approaches sound like the right one, or at least in
the right direction, and stands the chance of being accepted?

Thanks,
Otis

Fixing query-time multi-word synonym issue

Reply via email to