[ 
https://issues.apache.org/jira/browse/SOLR-12260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705285#comment-16705285
 ] 

Elizabeth Haubert commented on SOLR-12260:
------------------------------------------

Reduced the scope to the case where phrases SOW=false.

Consider the case where there is a multi-synonym with query-time expansion:
{code:java}
anaphylaxis, allergic reaction{code}

And a query "cat anaphylaxis aspirin":
http://localhost:8983/solr/new_core/select?debugQuery=on&defType=edismax&pf2=text&q=cat%20anapyhlaxis%20aspirin

Then this will generate 3 single-term clauses, 
*  ((text:cat)
*  ((text:"allergic reaction" text:anaphylaxis))
*  (text:aspirin)) 
 
and 2 pf2 clauses:
* ((spanNear([text:cat, spanOr([spanNear([text:allergic, text:reaction], 0, 
true), text:anaphylaxis])], 0, true)) 
* (spanNear([spanOr([spanNear([text:allergic, text:reaction], 0, true), 
text:anaphylaxis]), text:aspirin], 0, true)))</str>

If we search for the multi-term synonym "cat allergic reaction aspirin"
http://localhost:8983/solr/new_core/select?debugQuery=on&defType=edismax&df=text&pf2=text&q=cat%20allergic%20reaction%20aspirin

 the base query generated is the same:
* ((text:cat) 
* ((text:anaphylaxis text:"allergic reaction")) 
* (text:aspirin))

But the pf2 clauses are quite different:
*  ((text:"cat allergic")
*  (spanOr([text:anaphylaxis, spanNear([text:allergic, text:reaction], 0, 
true)])) 
* (text:"reaction aspirin"))

Aside from having two very different phrase boosts for the same base query, the 
consequences of this in the second case are fairly ugly for relevance as 
compared to the first:
1. The single term "anaphylaxis" will be boosted at the pf2 value in addition 
to the q value
2. Phrases containing "cat anaphylaxis" or "anaphylaxis aspirin" will not be 
considered for pf2

If the user puts multi-term synonym "allergic reaction" is put in quotes, the 
results get even worse, because the term is removed from the list of clauses 
considered as input for pf/pf2/pf3:

The base query stays the same:
* ((text:cat) 
* (spanOr([text:anaphylaxis, spanNear([text:allergic, text:reaction], 0, 
true)])) 
* (text:dog)) 

But now "allergic reaction" and "anaphylaxis" are removed entirely from the 
phrase clauses:
* (text:"cat dog")

What is the history around removing phrase clauses from consideration as input 
to pf/pf2/pf3?


> edismax: Include phrase clauses as terms in pf/pf2/pf3  when SOW=false
> ----------------------------------------------------------------------
>
>                 Key: SOLR-12260
>                 URL: https://issues.apache.org/jira/browse/SOLR-12260
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: query parsers
>            Reporter: Elizabeth Haubert
>            Priority: Major
>
> Phrase queries are currently built only on bareword clauses, which causes 
> unexpected behavior for queries with mixed quoted and bareword terms:
> q:cat "allergic reaction" dog  
> will flag "allergic reaction" as a phrase, and so will include it in none of 
> pf/pf2/pf3
> pf or pf2 will be generated as "cat dog".
> At a minimum, it would be nice if phrases would be applied as stand-alone 
> entities to pf2/pf3, if they contain the appropriate number of terms.  But I 
> think the work that has been done to accommodate graph queries should also be 
> able to handle these phrase terms following the pattern of:
> spanNear[text:cat, spanNear(text:allergic, text:reaction, 0, true), text:dog]
>       
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to