Steve Rowe created SOLR-10423:
---------------------------------

             Summary: ShingleFilter causes overly restrictive queries to be 
produced
                 Key: SOLR-10423
                 URL: https://issues.apache.org/jira/browse/SOLR-10423
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Steve Rowe


When {{sow=false}} and {{ShingleFilter}} is included in the query analyzer, 
{{QueryBuilder}} produces queries that inappropriately require sequential 
terms.  E.g. the query "A B C" produces {{(+A_B +B_C) A_B_C}} when the query 
analyzer includes {{<filter class="solr.ShingleFilterFactory" 
maxShingleSize="3" outputUnigrams="false" tokenSeparator="_"/>}}.

Aman Deep Singh reported this problem on the solr-user list. From 
[http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201703.mbox/%3ccanegtx9bwbpwqc-cxieac7qsas7x2tgzovomy5ztiagco1p...@mail.gmail.com%3e]:

{quote}
I was trying to use the shingle filter but it was not creating the query as
desirable.

my schema is

{noformat}
<fieldType name="cust_shingle" class="solr.TextField" 
positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.ShingleFilterFactory" outputUnigrams="false" 
maxShingleSize="4"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>
<field name="nameShingle" type="cust_shingle" indexed="true" stored="true"/>
{noformat}

my solr query is

{noformat}
http://localhost:8983/solr/productCollection/select?
 defType=edismax
&debugQuery=true
&q=one%20plus%20one%20four
&qf=nameShingle
&sow=false
&wt=xml
{noformat}

and it was creating the parsed query as

{noformat}
<str name="parsedquery">
(+(DisjunctionMaxQuery(((+nameShingle:one plus +nameShingle:plus one
+nameShingle:one four))) DisjunctionMaxQuery(((+nameShingle:one plus
+nameShingle:plus one four))) DisjunctionMaxQuery(((+nameShingle:one plus one 
+nameShingle:one four))) DisjunctionMaxQuery((nameShingle:one plus one 
four)))~1)/no_coord
</str>
<str name="parsedquery_toString">
*+((((+nameShingle:one plus +nameShingle:plus one +nameShingle:one four))
((+nameShingle:one plus +nameShingle:plus one four)) ((+nameShingle:one
plus one +nameShingle:one four)) (nameShingle:one plus one four))~1)*
</str>
{noformat}

So ideally token creations is perfect but in the query it is using boolean + 
operator which is causing the problem as if i have a document with name as "one 
plus one" ,according to the shingles it has to matched as its token will be  
("one plus","one plus one","plus one") .

I have tried using the q.op and played around the mm also but nothing is
giving me the correct response.

Any idea how i can fetch that document even if the document is missing any
token.

My expected response will be getting the document "one plus one" even the user 
query has any additional term like "one plus one two" and so on.
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to