[ 
https://issues.apache.org/jira/browse/SOLR-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13566539#comment-13566539
 ] 

Jack Krupansky commented on SOLR-4381:
--------------------------------------

I have personally implemented multi-word synonym support within a query parser, 
bypassing analysis for synonym processing as you suggest, but still examining 
the analysis chain to discover and load the field-specific synonym table. Yes, 
that approach can work, but I have refrained from proposing such a solution in 
Solr/Lucene since it is rather messy and not really an ideal solution because 
it does bypass analysis. There are ongoing discussions on the Lucene/Solr lists 
about how best to address query-time synonym processing; there have actually 
been some hopeful suggestions recently, but still a long way to go. I would 
rather see those discussions continue and come to fruition than see edismax 
changed in a way that would be incompatible with a more ideal solution.

I suppose you could simply have your patch remain a patch forever without 
integration into the Solr code base, for people who are desperate to have the 
feature in edismax, but due to its far-from-ideal nature (bypassing analysis 
and not supporting field-specific synonym tables), it would seem less likely to 
be integrated into the Solr code base since it would interfere with a broader 
solution. Note that I am NOT a committer, so I would have no official say in 
the matter. This is just my own opinion.

I suppose you could also package it as a separate "contrib" query parser and 
then it could be integrated into a Solr release and be available to anybody 
without the need for patching. That might be the more fruitful approach for 
near-term integration.

But I would definitely be -1 for direct integration into edismax since it does 
bypass analysis (and as an incidental objection doesn't support field-specific 
synonym tables.) Analysis is really important and gives the developer 
fine-tuning control over field-specific processing without changing any code.

OTOH, if it could be turned on and off dynamically with a request parameter, 
maybe direct integration into the Solr code base would be feasible. IOW, if it 
is simply a user-selectable "plugin", that would be more compelling.

Again, I am not a committer, so my opinion here can be freely ignored.

                
> Query-time multi-word synonym expansion
> ---------------------------------------
>
>                 Key: SOLR-4381
>                 URL: https://issues.apache.org/jira/browse/SOLR-4381
>             Project: Solr
>          Issue Type: Improvement
>          Components: query parsers
>            Reporter: Nolan Lawson
>            Priority: Minor
>              Labels: multi-word, queryparser, synonyms
>             Fix For: 4.2, 5.0
>
>         Attachments: SOLR-4381.patch
>
>
> This is an issue that seems to come up perennially.
> The [Solr 
> docs|http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory]
>  caution that index-time synonym expansion should be preferred to query-time 
> synonym expansion, due to the way multi-word synonyms are treated and how IDF 
> values can be boosted artificially. But query-time expansion should have huge 
> benefits, given that changes to the synonyms don't require re-indexing, the 
> index size stays the same, and the IDF values for the documents don't get 
> permanently altered.
> The proposed solution is to move the synonym expansion logic from the 
> analysis chain (either query- or index-type) and into a new QueryParser.  See 
> the attached patch for an implementation.
> The core Lucene functionality is untouched.  Instead, the EDismaxQParser is 
> extended, and synonym expansion is done on-the-fly.  Queries are parsed into 
> a lattice (i.e. all possible synonym combinations), while individual 
> components of the query are still handled by the EDismaxQParser itself.
> It's not an ideal solution by any stretch. But it's nice and self-contained, 
> so it invites experimentation and improvement.  And I think it fits in well 
> with the merry band of misfit query parsers, like {{func}} and {{frange}}.
> More details about this solution can be found in [this blog 
> post|http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/] and 
> [the Github page for the 
> code|https://github.com/healthonnet/hon-lucene-synonyms].
> At the risk of tooting my own horn, I also think this patch sufficiently 
> fixes SOLR-3390 (highlighting problems with multi-word synonyms) and 
> LUCENE-4499 (better support for multi-word synonyms).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to