[ https://issues.apache.org/jira/browse/LUCENE-7698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ere Maijala updated LUCENE-7698: -------------------------------- Description: (Please pardon me if the project or component are wrong!) CommonGramsQueryFilter breaks phrase queries. The behavior also seems to change with addition or removal of adjacent terms. Steps to reproduce: 1.) Download and extract Solr (in my test case version 6.4.1) somewhere. 2.) Modify server/solr/configsets/sample_techproducts_configs/conf/managed-schema and modify text_general fieldType by adding CommonGrams(Query)Filter before stopWordFilter: <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.CommonGramsFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <!-- in this example, we will only use synonyms at query time <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> --> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.CommonGramsQueryFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> 3.) Add "with" to server/solr/configsets/sample_techproducts_configs/conf/stopwords.txt and make sure the file has correct line endings (extracted from Solr zip it seems to contain DOS/Windows lien endings which may break things). 4.) Run the techproducts example with "bin/solr -e techproducts" 5.) Browse to <http://localhost:8983/solr/techproducts/select?q=%22iPod%20with%20Video%22&debugQuery=true> 6.) Observe that parsedquery in the debug output is empty 7.) Browse to <http://localhost:8983/solr/techproducts/select?q=%22Apple%2060%20GB%20iPod%20with%20Video%20Playback%20Black%22&debugQuery=true> 8.) Observe that parsedquery contains ipod_with as expected but not with_video. was: CommonGramsQueryFilter breaks phrase queries. The behavior also seems to change with addition or removal of adjacent terms. Steps to reproduce: 1.) Download and extract Solr (in my test case version 6.4.1) somewhere. 2.) Modify server/solr/configsets/sample_techproducts_configs/conf/managed-schema and modify text_general fieldType by adding CommonGrams(Query)Filter before stopWordFilter: <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.CommonGramsFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <!-- in this example, we will only use synonyms at query time <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> --> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.CommonGramsQueryFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> 3.) Add "with" to server/solr/configsets/sample_techproducts_configs/conf/stopwords.txt and make sure the file has correct line endings (extracted from Solr zip it seems to contain DOS/Windows lien endings which may break things). 4.) Run the techproducts example with "bin/solr -e techproducts" 5.) Browse to <http://localhost:8983/solr/techproducts/select?q=%22iPod%20with%20Video%22&debugQuery=true> 6.) Observe that parsedquery in the debug output is empty 7.) Browse to <http://localhost:8983/solr/techproducts/select?q=%22Apple%2060%20GB%20iPod%20with%20Video%20Playback%20Black%22&debugQuery=true> 8.) Observe that parsedquery contains ipod_with as expected but not with_video. > CommonGramsQueryFilter in the query analyzer chain breaks phrase queries > ------------------------------------------------------------------------ > > Key: LUCENE-7698 > URL: https://issues.apache.org/jira/browse/LUCENE-7698 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser > Affects Versions: 6.4, 6.4.1 > Reporter: Ere Maijala > Labels: regression > > (Please pardon me if the project or component are wrong!) > CommonGramsQueryFilter breaks phrase queries. The behavior also seems to > change with addition or removal of adjacent terms. > Steps to reproduce: > 1.) Download and extract Solr (in my test case version 6.4.1) somewhere. > 2.) Modify > server/solr/configsets/sample_techproducts_configs/conf/managed-schema and > modify text_general fieldType by adding CommonGrams(Query)Filter before > stopWordFilter: > <fieldType name="text_general" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.CommonGramsFilterFactory" ignoreCase="true" > words="stopwords.txt" /> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" /> > <!-- in this example, we will only use synonyms at query time > <filter class="solr.SynonymFilterFactory" > synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> > --> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.CommonGramsQueryFilterFactory" ignoreCase="true" > words="stopwords.txt"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" /> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > </fieldType> > 3.) Add "with" to > server/solr/configsets/sample_techproducts_configs/conf/stopwords.txt and > make sure the file has correct line endings (extracted from Solr zip it seems > to contain DOS/Windows lien endings which may break things). > 4.) Run the techproducts example with "bin/solr -e techproducts" > 5.) Browse to > <http://localhost:8983/solr/techproducts/select?q=%22iPod%20with%20Video%22&debugQuery=true> > 6.) Observe that parsedquery in the debug output is empty > 7.) Browse to > <http://localhost:8983/solr/techproducts/select?q=%22Apple%2060%20GB%20iPod%20with%20Video%20Playback%20Black%22&debugQuery=true> > 8.) Observe that parsedquery contains ipod_with as expected but not > with_video. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org