[ https://issues.apache.org/jira/browse/LUCENE-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148438#comment-17148438 ]

ASF subversion and git services commented on LUCENE-9418: --------------------------------------------------------- Commit 3a42716cdb06ba650ccb2cbc9953c05c9a8a6abc in lucene-solr's branch refs/heads/branch_8x from Alan Woodward [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3a42716 ] LUCENE-9418: Fix ordered intervals over interleaved terms (#1618) Given the input text 'A B A C', an ordered interval 'A B C' will currently return an incorrect internal [2, 3] in addition to the correct [0, 3] interval. This is due to a bug in the ORDERED algorithm, where we assume that after the first interval is returned, the sub-intervals are always in-order. This assumption only holds during minimization, as minimizing an interval may move the earlier terms beyond the trailing terms. For example, after the initial [0, 3] interval is found above, the algorithm will attempt to minimize it by advancing A to [2,2]. Because this is still before C at [3,3], but after B at [1,1], we then try advancing B, leaving it at [Inf,Inf]. Minimization has failed, so we return the original interval of [0,3]. However, when we come to retrieve the next interval, our subintervals look like this: A[2,2], B[Inf,Inf], C[3,3] - the assumption that they are in order is broken. The algorithm sees that A is before B, assumes that therefore all subsequent subintervals are in order, and returns the new interval. This commit fixes things by changing the assumption of ordering to only hold during minimization. When first finding a candidate interval, the algorithm now checks that all sub-intervals appear in order. > Ordered intervals can give inaccurate hits on interleaved terms > --------------------------------------------------------------- > > Key: LUCENE-9418 > URL: https://issues.apache.org/jira/browse/LUCENE-9418 > Project: Lucene - Core > Issue Type: Bug > Reporter: Alan Woodward > Assignee: Alan Woodward > Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Given the text 'A B A C', an ordered interval over 'A B C' will return the > inaccurate interval [2, 3], due to the way minimization is handled after > matches are found. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org