[
https://issues.apache.org/jira/browse/LUCENE-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036538#comment-13036538
]
Hoss Man commented on LUCENE-3120:
----------------------------------
comment i made on the mailing list regarding this topic...
{quote}
the crux of hte issue (as i recall) is that there is really no conecptual
reason to why a query for "'john' near 'john', in any order, with slop of Z"
shouldn't match a doc that contains only one instance of "john" ... the first
SpanTermQuery says "i found a match at position X" the second SpanTermQuery
says "i found a match at position Y" and the SpanNearQuery says "the differnece
between X and Y is less then Z" therefore i have a match. (The SpanNearQuery
can't fail just because X and Y are the same -- they might be two distinct term
instances, with differnet payloads perhaps, that just happen to have the same
position).
However: if true==inOrder case works because the SpanNearQuery enforces that "X
must be less then Y" so the same term can't ever match twice.
{quote}
> span query matches too many docs when two query terms are the same unless
> inOrder=true
> --------------------------------------------------------------------------------------
>
> Key: LUCENE-3120
> URL: https://issues.apache.org/jira/browse/LUCENE-3120
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/search
> Reporter: Doron Cohen
> Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3120.patch, LUCENE-3120.patch
>
>
> spinoff of user list discussion - [SpanNearQuery - inOrder
> parameter|http://markmail.org/message/i4cstlwgjmlcfwlc].
> With 3 documents:
> * "a b x c d"
> * "a b b d"
> * "a b x b y d"
> Here are a few queries (the number in parenthesis indicates expected #hits):
> These ones work *as expected*:
> * (1) in-order, slop=0, "b", "x", "b"
> * (1) in-order, slop=0, "b", "b"
> * (2) in-order, slop=1, "b", "b"
> These ones match *too many* hits:
> * (1) any-order, slop=0, "b", "x", "b"
> * (1) any-order, slop=1, "b", "x", "b"
> * (1) any-order, slop=2, "b", "x", "b"
> * (1) any-order, slop=3, "b", "x", "b"
> These ones match *too many* hits as well:
> * (1) any-order, slop=0, "b", "b"
> * (2) any-order, slop=1, "b", "b"
> Each of the above passes when using a phrase query (applying the slop, no
> in-order indication in phrase query).
> This seems related to a known overlapping spans issue - [non-overlapping Span
> queries|http://markmail.org/message/7jxn5eysjagjwlon] - as indicated by Hoss,
> so we might decide to close this bug after all, but I would like to at least
> have the junit that exposes the behavior in JIRA.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]