[jira] [Commented] (LUCENE-8531) QueryBuilder hard-codes inOrder=true for generated sloppy span near queries

Uwe Schindler (JIRA) Fri, 19 Oct 2018 09:58:09 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657089#comment-16657089
 ]


Uwe Schindler commented on LUCENE-8531:
---------------------------------------

+1, please do this. I will then take care of the Solr issue. This is not fully 
related, but the Solr code depends on the structure of Lucene queries produced 
and then reorders them with lots of instanceof checks. Which is bad 
spaghetti-code, but that's how it is.

I'd like to get a Lucene class that allows you to generate edismax-like queries 
that parses some text, creates bigram and trigram shingles out of it to allow a 
"match" query to assign a higher score for hits when you have terms in order 
and close to each other (put a higher precedence if bigrams or trigrams in your 
query string are close together in the document). A lot of people use this, but 
currently it only works with Solr's edismax and whenever you want to use this 
for other query parser or elasticsearch, you have to reimplement the shingling.

> QueryBuilder hard-codes inOrder=true for generated sloppy span near queries
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-8531
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8531
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/queryparser
>            Reporter: Steve Rowe
>            Assignee: Steve Rowe
>            Priority: Major
>         Attachments: LUCENE-8531.patch
>
>
> QueryBuilder.analyzeGraphPhrase() generates SpanNearQuery-s with passed-in 
> phraseSlop, but hard-codes inOrder ctor param as true.
> Before multi-term synonym support and graph token streams introduced the 
> possibility of generating SpanNearQuery-s, QueryBuilder generated 
> (Multi)PhraseQuery-s, which always interpret slop as allowing reordering 
> edits.  Solr's eDismax query parser generates phrase queries when its 
> pf/pf2/pf3 params are specified, and when multi-term synonyms are used with a 
> graph-aware synonym filter, SpanNearQuery-s are generated that require 
> clauses to be in order; unlike with (Multi)PhraseQuery-s, reordering edits 
> are not allowed, so this is a kind of regression.  See SOLR-12243 for edismax 
> pf/pf2/pf3 context.  (Note that the patch on SOLR-12243 also addresses 
> another problem that blocks eDismax from generating queries *at all* under 
> the above-described circumstances.)
> I propose adding a new analyzeGraphPhrase() method that allows configuration 
> of inOrder, which would allow eDismax to specify inOrder=false.  The existing 
> analyzeGraphPhrase() method would remain with its hard-coded inOrder=true, so 
> existing client behavior would remain unchanged.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8531) QueryBuilder hard-codes inOrder=true for generated sloppy span near queries

Reply via email to