[jira] [Commented] (LUCENE-7848) QueryBuilder.analyzeGraphPhrase does not handle gaps correctly

Jim Ferenczi (JIRA) Thu, 13 Jul 2017 05:08:20 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085610#comment-16085610
 ]


Jim Ferenczi commented on LUCENE-7848:
--------------------------------------

Dawid, sorry for the delay

{code}
spanNear([field:SPECIAL, 
          field:PROJECTS, 
          field:-, 
          spanOr([spanNear([SpanGap(:1), 
                            field:xxx,SPECIAL], 0, true), 
                  spanNear([SpanGap(:1), 
                            field:xxx, 
                            field:SPECIAL], 0, true)]), 
          field:PROJECTS, 
          field:-, 
          SpanGap(:1), 
          field:yyy], 0, true)
{code}

The problem is in those gaps inside {{spanOr}} -- the position increments get 
screwed up somehow. I created the above query manually and this one works just 
fine:
{code}
        Query q = SpanNearQuery.newOrderedNearQuery(field)
            .addClause(new SpanTermQuery(new Term(field, "SPECIAL")))
            .addClause(new SpanTermQuery(new Term(field, "PROJECTS")))
            .addClause(new SpanTermQuery(new Term(field, "-")))
            .addGap(1)
            .addClause(new SpanOrQuery(
                SpanNearQuery.newOrderedNearQuery(field)
                  .addClause(new SpanTermQuery(new Term(field, "xxx,SPECIAL")))
                  .addGap(1)
                  .build(),
                SpanNearQuery.newOrderedNearQuery(field)
                    .addClause(new SpanTermQuery(new Term(field, "xxx")))
                    .addClause(new SpanTermQuery(new Term(field, "SPECIAL")))
                    .build()
            ))
            .addClause(new SpanTermQuery(new Term(field, "PROJECTS")))
            .addClause(new SpanTermQuery(new Term(field, "-")))
            .addGap(1)
            .addClause(new SpanTermQuery(new Term(field, "yyy")))
            .build();
{code}

These two queries are valid and should return result. The first one represents 
exactly the graph produced by the WordDelimiterGraphFilter and the second one 
has an extra gap after "xxx,SPECIAL". This extra gap is not irrelevant, it's 
the only way to match the indexed form of the document with the path containing 
the term "xxx,SPECIAL". If you look at the indexed positions "xxx,SPECIAL" is 
at position 4 and position 5 has the term "SPECIAL". This is the flattened 
version of the graph but the query side builds the correct version and ignores 
that the positions are messed up by the indexer. If you add a manual gap then 
it allows "xxx,SPECIAL" to also ignore the next position (5, SPECIAL) and to 
jump directly to (6, PROJECTS).
Though the other path containing the splitted terms "xxx" and "SPECIAL" should 
match on both queries. I think this is the real problem and the fact that the 
second query match is just due to the additional gap that you added.
I don't have time at the moment to look at why the SpanQuery does not match the 
first query. It deserves a separate issue anyway so I think we should focus on 
whether the query produced by the QueryBuilder is valid or not. If it is then 
the patch can be merged and we can look at the other problem separately.
[~mgibney] can you open a new issue or add your comment and patch to 
https://issues.apache.org/jira/browse/LUCENE-7398 ? We should focus on the 
query building in this issue first.

> QueryBuilder.analyzeGraphPhrase does not handle gaps correctly
> --------------------------------------------------------------
>
>                 Key: LUCENE-7848
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7848
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 6.5, 6.6
>            Reporter: Jim Ferenczi
>         Attachments: capture-3.png, LUCENE-7848-branching-spanOr.patch, 
> LUCENE-7848.patch, LUCENE-7848.patch
>
>
> Position increments greater than 1 are ignored when the query builder creates 
> a graph phrase query. 
> Instead it should use SpanNearQuery.addGap for pos incr > 1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7848) QueryBuilder.analyzeGraphPhrase does not handle gaps correctly

Reply via email to