Hi Nick, I think this was fixed by https://issues.apache.org/jira/browse/LUCENE-7878 in Solr 6.6.1.
-- Steve www.lucidworks.com > On Feb 5, 2018, at 3:58 PM, Nick D <ndrake0...@gmail.com> wrote: > > I have run into an issue with multi-word synonyms and a min-should-match > (MM) of anything other than `0`, *Solr version 6.6.0*. > > Here is my example query, first with mm set to zero and the second with a > non-zero value: > > With MM set to 0 > select?fl=*&indent=on&wt=json&debug=ALL&q=EIB&qf=ngs_title%20ngs_field_description&sow=false&mm=0 > > which parse to: > > parsedquery_toString":"+(((+ngs_field_description:enterprise > +ngs_field_description:interface +ngs_field_description:builder) > ngs_field_description:eib) | ((+ngs_title:enterprise > +ngs_title:interface +ngs_title:builder) ngs_title:eib))~0.01" > > and using my default MM (2<-35%) > select?fl=*&indent=on&wt=json&debug=ALL&q=EIB&qf=ngs_title%20ngs_field_description&sow=false > > which parse to > > ((((+ngs_field_description:enterprise +ngs_field_description:interface > +ngs_field_description:builder) ngs_field_description:eib)~2) | > (((+ngs_title:enterprise +ngs_title:interface +ngs_title:builder) > ngs_title:eib)~2)) > > My synonym here is: > EIB, Enterprise Interface Builder > > For my two documents I have the field ngs_title with values "EIB" (Doc 1) > and "enterprise interface builder" (Doc 2) > > For both queries the doc 1 is always returned as EIB is matched, but for > doc 2 although I have EIB and Enterprise interface builder defined as > equivalent synonyms when the MM is not set to zero that document is not > returned. From the parsestring I see the ~2 being applied for the MM but my > expectation was that it has been met via the synonyms and the fact that I > am not actaully searching a phrase. > > I couldn't find much on the relationship between the two outside of a some > of the things Doug Turnbull had linked to another solr-user question and > this blog post that mentions weirdness around MM and multi-word: > > https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/ > > http://opensourceconnections.com/blog/2013/10/27/why-is-multi-term-synonyms-so-hard-in-solr/ > > Also looked through the comments here, > https://issues.apache.org/jira/browse/SOLR-9185, but at first glance didn't > see anything that jumped out at me. > > Here is the field definition for the ngs_* fields: > > <fieldType name="ngram" class="solr.TextField" positionIncrementGap="100"> > <analyzer type="index"> > <charFilter class="solr.MappingCharFilterFactory" > mapping="mapping-ISOLatin1Accent.txt"/> > <charFilter class="solr.PatternReplaceCharFilterFactory" > pattern="([()])" replacement=""/> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.PatternReplaceFilterFactory" > pattern="(^[^0-9A-Za-z_]+)|([^0-9A-Za-z_]+$)" replacement=""/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" > maxGramSize="50"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.SynonymGraphFilterFactory" > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > </fieldType> > > I am not sure if we cannot use MM anymore for these type of queries or if > there is something I setup incorrectly, any help would be greatly > appreciated. > > Nick