[jira] [Commented] (SOLR-4381) Query-time multi-word synonym expansion

JIRA Wed, 30 Jan 2013 07:45:17 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13566549#comment-13566549
 ]


Jan Høydahl commented on SOLR-4381:
-----------------------------------

We'd benefit from a more component based QP framework, then this could be a 
plugin. But that's for another century I guess :)

bq. I agree that the less configuration, the better. However, I kind of like 
leaving the SynonymFilterFactory out of the analysis chains, because it makes 
it clearer that the synonym expansion logic isn't happening there at all. Plus, 
in most of the use cases we've seen, the only difference between the query-time 
analyzer and the index-time analyzer was the SynonymFilterFactory itself, so 
removing it gained us some code simplicity, by allowing us to define just one 
analyzer for both. Perhaps other folks have had different experiences, though.

Sure, it's confusing not to have a WYSIWYG Analysis. Perhaps we can include 
fieldType referencs instead of defining analysis with a new syntax, something 
like what SpellCheckComponent does in config param {{queryAnalyzerFieldType}}.

And perhaps even better than tying dictionary to fieldType, would be to be able 
to choose dictionary per field name. Here's an imagined config based on these 
ideas:

{code:xml}
<queryParser name="synonym_edismax" 
class="solr.SynonymExpandingExtendedDismaxQParserPlugin">
  <str name="defaultDict">english</str>
  <lst name="dictionaries">
    <lst name="english">
      <str name="fieldType">synonym_type_en</str>
      <str name="useForFields">title *_en</str>
    </lst>
    <lst name="addresses">
      <str name="fieldType">synonym_type_addr</str>
      <str name="useForFields">street city state</str>
    </lst>
  </lst>
</queryparser>
{code}

We could even have a convention that if the queryParser config is empty, then 
look for a fieldType in Schema named "synonymDefaultAnalysis" and use that for 
synonym expansion for all fields of a TextField type.
                
> Query-time multi-word synonym expansion
> ---------------------------------------
>
>                 Key: SOLR-4381
>                 URL: https://issues.apache.org/jira/browse/SOLR-4381
>             Project: Solr
>          Issue Type: Improvement
>          Components: query parsers
>            Reporter: Nolan Lawson
>            Priority: Minor
>              Labels: multi-word, queryparser, synonyms
>             Fix For: 4.2, 5.0
>
>         Attachments: SOLR-4381.patch
>
>
> This is an issue that seems to come up perennially.
> The [Solr 
> docs|http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory]
>  caution that index-time synonym expansion should be preferred to query-time 
> synonym expansion, due to the way multi-word synonyms are treated and how IDF 
> values can be boosted artificially. But query-time expansion should have huge 
> benefits, given that changes to the synonyms don't require re-indexing, the 
> index size stays the same, and the IDF values for the documents don't get 
> permanently altered.
> The proposed solution is to move the synonym expansion logic from the 
> analysis chain (either query- or index-type) and into a new QueryParser.  See 
> the attached patch for an implementation.
> The core Lucene functionality is untouched.  Instead, the EDismaxQParser is 
> extended, and synonym expansion is done on-the-fly.  Queries are parsed into 
> a lattice (i.e. all possible synonym combinations), while individual 
> components of the query are still handled by the EDismaxQParser itself.
> It's not an ideal solution by any stretch. But it's nice and self-contained, 
> so it invites experimentation and improvement.  And I think it fits in well 
> with the merry band of misfit query parsers, like {{func}} and {{frange}}.
> More details about this solution can be found in [this blog 
> post|http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/] and 
> [the Github page for the 
> code|https://github.com/healthonnet/hon-lucene-synonyms].
> At the risk of tooting my own horn, I also think this patch sufficiently 
> fixes SOLR-3390 (highlighting problems with multi-word synonyms) and 
> LUCENE-4499 (better support for multi-word synonyms).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-4381) Query-time multi-word synonym expansion

Reply via email to