[ 
https://issues.apache.org/jira/browse/SOLR-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-4381:
------------------------------

      Component/s: query parsers
    Fix Version/s: 5.0
                   4.2

Hi. Well written blog post! I agree that the synonym feature is better 
implemented above analysis, so QP fits well. Question is whether each query 
parser would need its own implementation or if it could be generalized?

Also, I quite like the fact that the Analysis-synonyms allow for different 
dictionaries per field, so that if you have a qf=text_en text_de, to search two 
languages at the same time, they can expand synonyms differently. A suggestion 
to allow that in your approach could be for the QP to inspect the query 
analysis chain for each field in qf, and if it finds a SynoymFilterFactory, it 
will use that dictionary instead of the global one (and of course disable the 
analysis filter). This is a trick that eDisMax already does for conditional 
stopword handling. Such an approach makes it easier to migrate from what people 
may have now, to this solution.

I have not tested the patch yet. But I absolutely like the concept!
                
> Query-time multi-word synonym expansion
> ---------------------------------------
>
>                 Key: SOLR-4381
>                 URL: https://issues.apache.org/jira/browse/SOLR-4381
>             Project: Solr
>          Issue Type: Improvement
>          Components: query parsers
>            Reporter: Nolan Lawson
>            Priority: Minor
>              Labels: multi-word, queryparser, synonyms
>             Fix For: 4.2, 5.0
>
>         Attachments: SOLR-4381.patch
>
>
> This is an issue that seems to come up perennially.
> The [Solr 
> docs|http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory]
>  caution that index-time synonym expansion should be preferred to query-time 
> synonym expansion, due to the way multi-word synonyms are treated and how IDF 
> values can be boosted artificially. But query-time expansion should have huge 
> benefits, given that changes to the synonyms don't require re-indexing, the 
> index size stays the same, and the IDF values for the documents don't get 
> permanently altered.
> The proposed solution is to move the synonym expansion logic from the 
> analysis chain (either query- or index-type) and into a new QueryParser.  See 
> the attached patch for an implementation.
> The core Lucene functionality is untouched.  Instead, the EDismaxQParser is 
> extended, and synonym expansion is done on-the-fly.  Queries are parsed into 
> a lattice (i.e. all possible synonym combinations), while individual 
> components of the query are still handled by the EDismaxQParser itself.
> It's not an ideal solution by any stretch. But it's nice and self-contained, 
> so it invites experimentation and improvement.  And I think it fits in well 
> with the merry band of misfit query parsers, like {{func}} and {{frange}}.
> More details about this solution can be found in [this blog 
> post|http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/] and 
> [the Github page for the 
> code|https://github.com/healthonnet/hon-lucene-synonyms].
> At the risk of tooting my own horn, I also think this patch sufficiently 
> fixes SOLR-3390 (highlighting problems with multi-word synonyms) and 
> LUCENE-4499 (better support for multi-word synonyms).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to