[jira] [Commented] (SOLR-10102) SynonymFilterFactory in example file is on query not index

Cassandra Targett (JIRA) Tue, 07 Feb 2017 15:52:20 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857051#comment-15857051
 ]


Cassandra Targett commented on SOLR-10102:
------------------------------------------

I read the sentence somewhat differently, as applying to the concept of 
multi-word synonyms, and not synonyms in general. Here is the full context:

bq. Keep in mind that while the SynonymFilter will happily work with synonyms 
containing multiple words (ie: "sea biscuit, sea biscit, seabiscuit") The 
recommended approach for dealing with synonyms like this, is to expand the 
synonym when indexing.

While "The" is capitalized there, the first sentence is incomplete unless you 
assume the section after the parentheses is the remainder of the sentence. In 
that context, I think the sentence does not in general recommend using 
index-time synonyms, but only when trying to work with multi-word synonyms. 

The section begins with a link to the documentation for general use 
(https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-SynonymFilter),
 but whoever added that did not make it clear that the REAL documentation for 
Synonym Filters is in the Ref Guide and no longer in the wiki.

> SynonymFilterFactory in example file is on query not index
> ----------------------------------------------------------
>
>                 Key: SOLR-10102
>                 URL: https://issues.apache.org/jira/browse/SOLR-10102
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: examples
>    Affects Versions: 4.10.2, 6.4.1
>            Reporter: Mike Lissner
>
> The example files for both 4.10.2 and 6.4.1 have entries like these:
> {code:xml}
>   <fieldType name="text_general" class="solr.TextField" 
> positionIncrementGap="100" multiValued="true">
>     <analyzer type="index">
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>       <filter class="solr.StopFilterFactory" words="stopwords.txt" 
> ignoreCase="true"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
>     <analyzer type="query">
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>       <filter class="solr.StopFilterFactory" words="stopwords.txt" 
> ignoreCase="true"/>
>       <!-- THIS IS WRONG, RIGHT? -->
>       <filter class="solr.SynonymFilterFactory" expand="true" 
> ignoreCase="true" synonyms="synonyms.txt"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
>   </fieldType>
> {code}
> You'll note that the synonym filter is applied at query time, which will 
> totally fail. Even [the 
> docs|https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory]
>  say:
> bq. The recommended approach for dealing with synonyms like this, is to 
> expand the synonym when indexing.
> Can we fix this? Or is there a reason why this is like this? As I understand 
> it, having synonyms on the query means that things just won't be returned 
> that should be. 
> For example, we have the token "5" set up with a synonym to the word "five". 
> So, if somebody searches for 5, the query filter will expand it to "5 AND 
> five", which, sure enough, the index doesn't match....no results. 
> So...instead of expanding the result set, like synonyms are supposed to do, 
> this actively contracts it.
> I hope my frustration in this is misplaced, but if I'm right about this bug, 
> can I say that this is the kind of thing that makes Solr super frustrating to 
> use? 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-10102) SynonymFilterFactory in example file is on query not index

Reply via email to