[jira] [Commented] (SOLR-4381) Query-time multi-word synonym expansion

Jack Krupansky (JIRA) Thu, 31 Jan 2013 06:45:37 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13567675#comment-13567675
 ]


Jack Krupansky commented on SOLR-4381:
--------------------------------------

If this issue is to be seriously pursued as part of edismax, the following 
should be included here in JIRA:

1. A concise summary of the overall approach, with key technical details.

2. A few example queries, both source and the resulting "parsed query". Key 
test cases, if you will.

3. A semi-detailed summary of what the user of the change needs to know, in 
terms of how to set it up, manage it, use it, and its precise effects.

4. Detail any limitations.

That said, if you were to implement this as pat of a standalone, "contrib" 
query parser, you you are much freer to do whatever you want with no regard to 
potential consequences and need not worry about fine details. But if you want 
this to be part of edismax, you'll need to be very, very careful. I would 
suggest the former - it would allow you to get going much more rapidly. 
Integration with edismax proper could be deferred until you're happy that 
you've done all you've intended to do - and meanwhile the contrib module would 
be available for others to use out of the box.



4. Specifically what features of the Synonym Filter will be lost by using this 
approach.
                
> Query-time multi-word synonym expansion
> ---------------------------------------
>
>                 Key: SOLR-4381
>                 URL: https://issues.apache.org/jira/browse/SOLR-4381
>             Project: Solr
>          Issue Type: Improvement
>          Components: query parsers
>            Reporter: Nolan Lawson
>            Priority: Minor
>              Labels: multi-word, queryparser, synonyms
>             Fix For: 4.2, 5.0
>
>         Attachments: SOLR-4381-2.patch, SOLR-4381.patch
>
>
> This is an issue that seems to come up perennially.
> The [Solr 
> docs|http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory]
>  caution that index-time synonym expansion should be preferred to query-time 
> synonym expansion, due to the way multi-word synonyms are treated and how IDF 
> values can be boosted artificially. But query-time expansion should have huge 
> benefits, given that changes to the synonyms don't require re-indexing, the 
> index size stays the same, and the IDF values for the documents don't get 
> permanently altered.
> The proposed solution is to move the synonym expansion logic from the 
> analysis chain (either query- or index-type) and into a new QueryParser.  See 
> the attached patch for an implementation.
> The core Lucene functionality is untouched.  Instead, the EDismaxQParser is 
> extended, and synonym expansion is done on-the-fly.  Queries are parsed into 
> a lattice (i.e. all possible synonym combinations), while individual 
> components of the query are still handled by the EDismaxQParser itself.
> It's not an ideal solution by any stretch. But it's nice and self-contained, 
> so it invites experimentation and improvement.  And I think it fits in well 
> with the merry band of misfit query parsers, like {{func}} and {{frange}}.
> More details about this solution can be found in [this blog 
> post|http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/] and 
> [the Github page for the 
> code|https://github.com/healthonnet/hon-lucene-synonyms].
> At the risk of tooting my own horn, I also think this patch sufficiently 
> fixes SOLR-3390 (highlighting problems with multi-word synonyms) and 
> LUCENE-4499 (better support for multi-word synonyms).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-4381) Query-time multi-word synonym expansion

Reply via email to