[ https://issues.apache.org/jira/browse/LUCENE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085610#comment-16085610 ]
Jim Ferenczi commented on LUCENE-7848: -------------------------------------- Dawid, sorry for the delay {code} spanNear([field:SPECIAL, field:PROJECTS, field:-, spanOr([spanNear([SpanGap(:1), field:xxx,SPECIAL], 0, true), spanNear([SpanGap(:1), field:xxx, field:SPECIAL], 0, true)]), field:PROJECTS, field:-, SpanGap(:1), field:yyy], 0, true) {code} The problem is in those gaps inside {{spanOr}} -- the position increments get screwed up somehow. I created the above query manually and this one works just fine: {code} Query q = SpanNearQuery.newOrderedNearQuery(field) .addClause(new SpanTermQuery(new Term(field, "SPECIAL"))) .addClause(new SpanTermQuery(new Term(field, "PROJECTS"))) .addClause(new SpanTermQuery(new Term(field, "-"))) .addGap(1) .addClause(new SpanOrQuery( SpanNearQuery.newOrderedNearQuery(field) .addClause(new SpanTermQuery(new Term(field, "xxx,SPECIAL"))) .addGap(1) .build(), SpanNearQuery.newOrderedNearQuery(field) .addClause(new SpanTermQuery(new Term(field, "xxx"))) .addClause(new SpanTermQuery(new Term(field, "SPECIAL"))) .build() )) .addClause(new SpanTermQuery(new Term(field, "PROJECTS"))) .addClause(new SpanTermQuery(new Term(field, "-"))) .addGap(1) .addClause(new SpanTermQuery(new Term(field, "yyy"))) .build(); {code} These two queries are valid and should return result. The first one represents exactly the graph produced by the WordDelimiterGraphFilter and the second one has an extra gap after "xxx,SPECIAL". This extra gap is not irrelevant, it's the only way to match the indexed form of the document with the path containing the term "xxx,SPECIAL". If you look at the indexed positions "xxx,SPECIAL" is at position 4 and position 5 has the term "SPECIAL". This is the flattened version of the graph but the query side builds the correct version and ignores that the positions are messed up by the indexer. If you add a manual gap then it allows "xxx,SPECIAL" to also ignore the next position (5, SPECIAL) and to jump directly to (6, PROJECTS). Though the other path containing the splitted terms "xxx" and "SPECIAL" should match on both queries. I think this is the real problem and the fact that the second query match is just due to the additional gap that you added. I don't have time at the moment to look at why the SpanQuery does not match the first query. It deserves a separate issue anyway so I think we should focus on whether the query produced by the QueryBuilder is valid or not. If it is then the patch can be merged and we can look at the other problem separately. [~mgibney] can you open a new issue or add your comment and patch to https://issues.apache.org/jira/browse/LUCENE-7398 ? We should focus on the query building in this issue first. > QueryBuilder.analyzeGraphPhrase does not handle gaps correctly > -------------------------------------------------------------- > > Key: LUCENE-7848 > URL: https://issues.apache.org/jira/browse/LUCENE-7848 > Project: Lucene - Core > Issue Type: Bug > Affects Versions: 6.5, 6.6 > Reporter: Jim Ferenczi > Attachments: capture-3.png, LUCENE-7848-branching-spanOr.patch, > LUCENE-7848.patch, LUCENE-7848.patch > > > Position increments greater than 1 are ignored when the query builder creates > a graph phrase query. > Instead it should use SpanNearQuery.addGap for pos incr > 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org