[ https://issues.apache.org/jira/browse/SOLR-11968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373710#comment-16373710 ]
Steve Rowe commented on SOLR-11968: ----------------------------------- Thanks Jim, I didn't realize that StopFilter (and other FilteringTokenFilter's I assume) can still produce bad token streams. I added a test showing this, based on your example, to LUCENE-4065. bq. There are other cases where it is not possible to "fix" the graph produced by the token stream which is why I said that a stop filter that would remove gaps is IMO the best solution Do you have examples of these other cases? Maybe put them on LUCENE-4065? > Multi-words query time synonyms > ------------------------------- > > Key: SOLR-11968 > URL: https://issues.apache.org/jira/browse/SOLR-11968 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers, Schema and Analysis > Affects Versions: master (8.0), 6.6.2 > Environment: Centos 7.x > Reporter: Dominique Béjean > Assignee: Steve Rowe > Priority: Major > > I am trying multi words query time synonyms with Solr 6.6.2 and > SynonymGraphFilterFactory filter as explain in this article > > [https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/] > > My field type is : > {code:java} > <fieldType name="textSyn" class="solr.TextField" positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.ElisionFilterFactory" ignoreCase="true" > articles="lang/contractions_fr.txt"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.ASCIIFoldingFilterFactory"/> > <filter class="solr.StopFilterFactory" words="stopwords.txt" > ignoreCase="true"/> > <filter class="solr.FrenchMinimalStemFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.ElisionFilterFactory" ignoreCase="true" > articles="lang/contractions_fr.txt"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > <filter class="solr.ASCIIFoldingFilterFactory"/> > <filter class="solr.StopFilterFactory" words="stopwords.txt" > ignoreCase="true"/> > <filter class="solr.FrenchMinimalStemFilterFactory"/> > </analyzer> > </fieldType>{code} > > synonyms.txt contains the line : > {code:java} > om, olympique de marseille{code} > > stopwords.txt contains the word > {code:java} > de{code} > > The order of words in my query has an impact on the generated query in > edismax > {code:java} > q={!edismax qf='name_text_gp' v=$qq} > &sow=false > &qq=...{code} > with "qq=om maillot" or "qq=olympique de marseille maillot", I can see the > synonyms expansion. It is working as expected. > {code:java} > "parsedquery_toString":"+(((+name_text_gp:olympiqu +name_text_gp:marseil > +name_text_gp:maillot) name_text_gp:om))", > "parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu > +name_text_gp:marseil +name_text_gp:maillot)))",{code} > with "qq=maillot om" or "qq=maillot olympique de marseille", I can see the > same generated query > {code:java} > "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))", > "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",{code} > I don't understand these generated queries. The first one looks like the > synonym expansion is ignored, but the second one shows it is not ignored and > only the synonym term is used. > > When I test the analisys for the field type the synonyms are correctly > expanded for both expressions > {code:java} > om maillot > maillot om > olympique de marseille maillot > maillot olympique de marseille{code} > resulting outputs always include the following terms (obvioulsly not always > in the same order) > {code:java} > olympiqu om marseil maillot {code} > > So, i suspect an issue with edismax query parser. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org