Nolan Lawson created SOLR-4381:
----------------------------------

             Summary: Query-time multi-word synonym expansion
                 Key: SOLR-4381
                 URL: https://issues.apache.org/jira/browse/SOLR-4381
             Project: Solr
          Issue Type: Improvement
            Reporter: Nolan Lawson
            Priority: Minor


This is an issue that seems to come up perennially.

The [Solr 
docs|http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory]
 caution that index-time synonym expansion should be preferred to query-time 
synonym expansion, due to the way multi-word synonyms are treated and how IDF 
values can be boosted artificially. But query-time expansion should have huge 
benefits, given that changes to the synonyms don't require re-indexing, the 
index size stays the same, and the IDF values for the documents don't get 
permanently altered.

The proposed solution is to move the synonym expansion logic from the analysis 
chain (either query- or index-type) and into a new QueryParser.  See the 
attached patch for an implementation.

The core Lucene functionality is untouched.  Instead, the EDismaxQParser is 
extended, and synonym expansion is done on-the-fly.  Queries are parsed into a 
lattice (i.e. all possible synonym combinations), while individual components 
of the query are still handled by the EDismaxQParser itself.

It's not an ideal solution by any stretch. But it's nice and self-contained, so 
it invites experimentation and improvement.  And I think it fits in well with 
the merry band of misfit query parsers, like {{func}} and {{frange}}.

More details about this solution can be found in [this blog 
post|http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/] and 
[the Github page for the 
code|https://github.com/healthonnet/hon-lucene-synonyms].

At the risk of tooting my own horn, I also think this patch sufficiently fixes 
SOLR-3390 (highlighting problems with multi-word synonyms) and LUCENE-4499 
(better support for multi-word synonyms).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to