[jira] [Commented] (SOLR-10102) SynonymFilterFactory in example file is on query not index

Shawn Heisey (JIRA) Tue, 07 Feb 2017 13:18:07 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856811#comment-15856811
 ]


Shawn Heisey commented on SOLR-10102:
-------------------------------------

If the synonyms are expanded at index time, then you must completely reindex if 
you change your list of synonyms.  Also, your index might get significantly 
larger.

Applying them at query time results in a smaller index, and the ability to 
change the list without reindexing.

Assuming the standard lucne query parser, the default operator defaults to "OR" 
... which means that the query you're talking about would be "5 OR five" unless 
the defaults in Solr are changed.  Changing the default operator to "AND" tends 
to have effects that people don't think about.

If you've changed the default operator, it sounds like you should indeed want 
to do synonyms at index time, but for somebody who is going with defaults, 
query time can make things better.

Sounds like the docs need a little bit of a tweak.

> SynonymFilterFactory in example file is on query not index
> ----------------------------------------------------------
>
>                 Key: SOLR-10102
>                 URL: https://issues.apache.org/jira/browse/SOLR-10102
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: examples
>    Affects Versions: 4.10.2, 6.4.1
>            Reporter: Mike Lissner
>
> The example files for both 4.10.2 and 6.4.1 have entries like these:
> {code:xml}
>   <fieldType name="text_general" class="solr.TextField" 
> positionIncrementGap="100" multiValued="true">
>     <analyzer type="index">
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>       <filter class="solr.StopFilterFactory" words="stopwords.txt" 
> ignoreCase="true"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
>     <analyzer type="query">
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>       <filter class="solr.StopFilterFactory" words="stopwords.txt" 
> ignoreCase="true"/>
>       <!-- THIS IS WRONG, RIGHT? -->
>       <filter class="solr.SynonymFilterFactory" expand="true" 
> ignoreCase="true" synonyms="synonyms.txt"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
>   </fieldType>
> {code}
> You'll note that the synonym filter is applied at query time, which will 
> totally fail. Even [the 
> docs|https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory]
>  say:
> bq. The recommended approach for dealing with synonyms like this, is to 
> expand the synonym when indexing.
> Can we fix this? Or is there a reason why this is like this? As I understand 
> it, having synonyms on the query means that things just won't be returned 
> that should be. 
> For example, we have the token "5" set up with a synonym to the word "five". 
> So, if somebody searches for 5, the query filter will expand it to "5 AND 
> five", which, sure enough, the index doesn't match....no results. 
> So...instead of expanding the result set, like synonyms are supposed to do, 
> this actively contracts it.
> I hope my frustration in this is misplaced, but if I'm right about this bug, 
> can I say that this is the kind of thing that makes Solr super frustrating to 
> use? 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-10102) SynonymFilterFactory in example file is on query not index

Reply via email to