[ 
https://issues.apache.org/jira/browse/LUCENE-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16797618#comment-16797618
 ] 

Jim Ferenczi commented on LUCENE-8477:
--------------------------------------

The patch fixes disjunctions that share a common prefix but the same problem 
can arise for disjunctions that share suffixes. For instance the query or(york, 
BLOCK(new, york)) has the same minimum interval semantic than "york". So a 
query like BLOCK(in, or(york, BLOCK(new, york))) will not match "in new york" 
because "new york" is discarded by the minimum interval "york". We could apply 
the same logic and rewrite the query automatically but I am sure we can find 
other pathological cases due to minimum interval semantics. IMO we should 
document this unintuitive behavior rather than rewriting all queries in a 
non-optimal form. 

> Improve handling of inner disjunctions in intervals
> ---------------------------------------------------
>
>                 Key: LUCENE-8477
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8477
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8477.patch, LUCENE-8477.patch, LUCENE-8477.patch, 
> LUCENE-8477.patch
>
>
> The current implementation of the disjunction interval produced by 
> {{Intervals.or}} is a direct implementation of the OR operator from the Vigna 
> paper.  This produces minimal intervals, meaning that (a) is preferred over 
> (a b), and (b) also over (a b).  This has advantages when it comes to 
> counting intervals for scoring, but also has drawbacks when it comes to 
> matching.  For example, a phrase query for ((a OR (a b)) BLOCK (c)) will not 
> match the document (a b c), because (a) will be preferred over (a b), and (a 
> c) does not match.
> This ticket is to discuss the best way of dealing with disjunctions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to