[ 
https://issues.apache.org/jira/browse/SOLR-11954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16388838#comment-16388838
 ] 

Shawn Heisey commented on SOLR-11954:
-------------------------------------

There is a very subtle difference in how the analysis works with the different 
synonym definition.

Either way, the query terms produced are b2, b, boron, ii, and 2.  But with the 
second definition, the "b" and "2" terms have the type "word" whereas "boron" 
and "ii" are tagged as SYNONYM.  With the first definition, all of the terms 
other than b2 are tagged as SYNONYM.  I think this is expected, because of how 
the => definition in synonyms works.

What's not expected is what the query parser does with it -- for the first 
definition two of the five terms that analysis produces are lost. Something 
like this is probably what the first definition SHOULD have produced:

{noformat}
my_field:b2 my_field:b my_field:2 Synonym(my_field:boron my_field:ii)
{noformat}

Or maybe:

{noformat}
my_field:b2 my_field:b Synonym(my_field:boron) my_field:2 Synonym(my_field:ii)
{noformat}

I don't think it can possibly produce an identical parsedQuery, but what it IS 
doing does look wrong.


> Search behavior depends on kind of synonym mappings
> ---------------------------------------------------
>
>                 Key: SOLR-11954
>                 URL: https://issues.apache.org/jira/browse/SOLR-11954
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 7.2.1
>            Reporter: Alexandr
>            Priority: Major
>              Labels: synonyms
>
> For field with such type
> {noformat}
> <fieldtype name="fulltext_en" class="solr.TextField" 
> autoGeneratePhraseQueries="true">
>    <analyzer type="index">
>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>       <filter class="solr.WordDelimiterGraphFilterFactory"
> generateWordParts="1" generateNumberParts="1" splitOnNumerics="1"
> catenateWords="1" catenateNumbers="1" catenateAll="0" preserveOriginal="1" 
> protected="protwords_en.txt"/>
>       <filter class="solr.FlattenGraphFilterFactory"/>
>    </analyzer>
>    <analyzer type="query">
>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>       <filter class="solr.WordDelimiterGraphFilterFactory"
> generateWordParts="1" generateNumberParts="1" splitOnNumerics="1"
> catenateWords="0" catenateNumbers="0" catenateAll="0" preserveOriginal="1" 
> protected="protwords_en.txt"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms_en.txt" ignoreCase="true" expand="true"/>
>    </analyzer>
> </fieldtype>{noformat}
>  If synonyms configured in next way
> {noformat}
> b=>b,boron
> 2=>ii,2{noformat}
> Then for query "my_field:b2" parsedQuery looks so "my_field:b2 
> Synonym(my_field:2 my_field:ii)"
> But when synonyms configured in such way
> {noformat}
> b,boron
> ii,2{noformat}
> Then for query "my_field:b2" parsedQuery looks so "my_field:b2 my_field:\"b 
> 2\" my_field:\"b ii\" my_field:\"boron 2\" my_field:\"boron ii\")"
> The second query is correct (it uses synonyms for two parts after word 
> split). 
> Search behavior should not depends on kind of synonym mappings.
> This issue also has been discussed in solr user mailing list
>  
> [http://lucene.472066.n3.nabble.com/SynonymGraphFilterFactory-with-WordDelimiterGraphFilterFactory-usage-td4373974.html]
> It reproduced for me for Solr 7.1.0, but it also can be reproduced for 7.2.1 
> version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to