[
https://issues.apache.org/jira/browse/LUCENE-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602124#comment-16602124
]
Alan Woodward commented on LUCENE-8477:
---------------------------------------
I can see a couple of options here:
1) Add a new operator, OR_MAX, which doesn't try to minimize its internals, and
sorts prefixes last. This deals with ((a OR (a b)) BLOCK c) mentioned in the
description, but it still fails to match in other situations, such as (b OR (b
c)) BLOCK c - in this case because (b c) will sort before (b), so the interval
will try to match (b c c). It also makes it less easy to use, as consumers now
need to understand the semantics of two separate OR operators
2) Allow IntervalsSource to rewrite itself, so that ((a OR (a b)) BLOCK c)
becomes (a BLOCK c) OR ((a b) BLOCK c). This would be a lot easier on the
user, but I'm not sure how easy it would be from an implementation point of
view - it may end up adding lots of extra methods to IntervalsSource.
> Improve handling of inner disjunctions in intervals
> ---------------------------------------------------
>
> Key: LUCENE-8477
> URL: https://issues.apache.org/jira/browse/LUCENE-8477
> Project: Lucene - Core
> Issue Type: New Feature
> Reporter: Alan Woodward
> Priority: Major
>
> The current implementation of the disjunction interval produced by
> {{Intervals.or}} is a direct implementation of the OR operator from the Vigna
> paper. This produces minimal intervals, meaning that (a) is preferred over
> (a b), and (b) also over (a b). This has advantages when it comes to
> counting intervals for scoring, but also has drawbacks when it comes to
> matching. For example, a phrase query for ((a OR (a b)) BLOCK (c)) will not
> match the document (a b c), because (a) will be preferred over (a b), and (a
> c) does not match.
> This ticket is to discuss the best way of dealing with disjunctions.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]