[jira] [Comment Edited] (SOLR-4362) edismax, phrase query with slop, pf parameter

Elizabeth Haubert (JIRA) Fri, 30 Nov 2018 12:29:28 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705244#comment-16705244
 ]


Elizabeth Haubert edited comment on SOLR-4362 at 11/30/18 8:28 PM:
-------------------------------------------------------------------

This is still an issue on the 7_6 branch; the underlying problem is related to 
https://issues.apache.org/jira/browse/SOLR-12260?filter=-2, where the list of 
"normal clauses" used for generating pf/pf2/pf3 is not correct. Note that it is 
necessary to have a pf2 set for the problem to occur.

Using the example query:
{code:java}
http://localhost:8983/solr/new_core/select?debugQuery=on&defType=edismax&pf2=text&q="phrase
 query"~10 term
{code}
It will produce:
{code:java}
<str name="parsedquery_toString">+((text:"phrase query"~10) (text:term)) 
(text:"10 term")</str>
{code}
When Solr parses the original query string, it generates 3 clauses:
 * "phrase query"
 * ~10
 * term

In the course of 
[parseOriginalQuery|https://github.com/apache/lucene-solr/blob/5c4ab188eb09ad3215f461523b9873037803ed7e/solr/core/src/java/org/apache/solr/search/ExtendedDismaxQParser.java#L384],
 "phrase query" is identified as a phrase, and ~10 is correctly identified as 
the slop associated with that phrase, and removed from clauses in the context 
of parseOriginalQuery.

That change is not picked up when control flow returns back to parse(). So when 
addPhraseQueries goes to look for the shingles to glue together, it starts with
 * "phrase query"
 * ~10
 * term

Rejects "phrase query" as a candidate for shingling because it is a phrase, and 
is left with
 * ~10
 * term

It then generates a pf2 clause "10 term", and tacks that on to the query. If 
the underlying field tokenization strips off punctuation, "10 term" phrases 
will match accordingly.

Updated the original unit test to use the same schema field as testPfPs().

 


was (Author: ehaubert):
This is still an issue on the 7_6 branch; the underlying problem is related to 
https://issues.apache.org/jira/browse/SOLR-12260?filter=-2, where the list of 
"normal clauses" used for generating pf/pf2/pf3 is not correct. Note that it is 
necessary to have a pf2 set for the problem to occur.

Using the example query:
{code:java}
http://localhost:8983/solr/new_core/select?debugQuery=on&defType=edismax&pf2=text&q="phrase
 query"~10 term
{code}
It will produce:
{code:java}
<str name="parsedquery_toString">+((text:"phrase query"~10) (text:term)) 
(text:"10 term")</str>
{code}
When Solr parses the original query string, it generates 3 clauses:
 * "phrase query"
 * ~10
 * term

In the course of 
[parseOriginalQuery|https://github.com/apache/lucene-solr/blob/5c4ab188eb09ad3215f461523b9873037803ed7e/solr/core/src/java/org/apache/solr/search/ExtendedDismaxQParser.java#L384],
 "phrase query" is identified as a phrase, and ~10 is correctly identified as 
the slop associated with that phrase, and removed from clauses in the context 
of parseOriginalQuery.

That change is not picked up when control flow returns back to parse(). So when 
addPhraseQueries goes to look for the shingles to glue together, it starts with
 * "phrase query"
 * ~10
 * term

Rejects "phrase query" as a candidate for shingling because it is a phrase, and 
is left with
 * ~10
 * term

It then generates a pf2 clause "10 term", and tacks that on to the query. If 
the underlying field tokenization strips off punctuation, "10 term" phrases 
will match accordingly.

Updated the test in the unit test to use the same schema field as testPfPs().

 

> edismax, phrase query with slop, pf parameter
> ---------------------------------------------
>
>                 Key: SOLR-4362
>                 URL: https://issues.apache.org/jira/browse/SOLR-4362
>             Project: Solr
>          Issue Type: Bug
>          Components: query parsers
>    Affects Versions: 4.1
>            Reporter: Ahmet Arslan
>            Priority: Major
>              Labels: edismax, pf
>         Attachments: SOLR-4362.patch, SOLR-4362.patch
>
>
> When sloppy phrase query (plus additional term) is used with edismax, slop 
> value is search against fields that are supplied with pf parameter.
> Example : With this url &q="phrase query"~10 term&qf=text&pf=text document 
> having "10 term" in its text field is boosted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SOLR-4362) edismax, phrase query with slop, pf parameter

Reply via email to