I dug some more into this, and e.g. compared how does PhraseQuery behave on
these cases ('a' and 'b' are indexed at the same position):

"a b" - NO MATCH (expected)
"a b"~1+ (1+ - any slop >= 1) - MATCH

Because of the mathematics behind PQ, the slop determines the edit distance
to match 'a' and 'b'. So in slop=1, 'b' is moved one place, *on-to* 'a',
and since they're in the same position, it's a match. If for instance they
were indexed in successive positions, to match "b a" you'd need to do "b
a"~2: 1 move to get 'b' on 'a' and another to get 'b' after 'a'.

SpanNearQuery, UnOrdered behaves the same and matches "a b" with slop=1. In
Ordered mode, I guess the rationale was that someone requested "b follows
a"... there is also further evidence in SpanTermQuery, or actually
TermSpans that implement end() to return position+1 (i.e. the end span is
EXCLUSIVE).

We could add a flag/mode/whatever to allow users to extend SNQ and change
the logic to make the span range INCLUSIVE, thus matching "a b" even when
they are indexed at the same position. But that won't be enough, and we'll
need to add such logic e.g at least to TermSpans, so that its end() returns
'position' and not off-by-1.

I think for now I'll just implement my own SpanQuery, and then see if by
opening simple hooks (that don't complicate the API), I could reuse any of
the existing impls.

Shai

On Thu, Sep 4, 2014 at 7:51 PM, Chris Hostetter <[email protected]>
wrote:

>
> : So according to those issues, it seems others also think that end1 <=
> end2
> : is also a "correct" behavior of SNQ (in case ordered=true, and
>
> A more accurate way ot put it may be that *some* people felt like it was
> the lest counter-intuitive behavior they could think of, and everybody
> else either didn't have an opinion, didn't share an opinion, or didn't
> understand spans enough to know if they had an opinion.
>
> : start1==start2). Was there a reason these issues were not resolved (e.g.
> : LUCENE-3120)? I'm asking if there was additional discussion where people
> : raised objections against this, or was it just a matter of lack of
> : time/focus?
>
> you could probably find more wisps of discussion about spans and nearness
> in general in the list archives, but i don't remember anything really
> concrete beyond the thread mentioned in LUCENE-3120 (and the earlier
> thread linked ot from it) ... and the only reason i remember that is
> because it wsa easy to find from the jira...
>
>
> https://mail-archives.apache.org/mod_mbox/lucene-dev/201105.mbox/%3c614c529d389a5944b351f7dfb7594f240126d...@uksrpblkexb01.detica.com%3E
>
>
> https://mail-archives.apache.org/mod_mbox/lucene-dev/200612.mbox/%[email protected]%3E
>
> : > https://issues.apache.org/jira/browse/LUCENE-3120
> : > https://issues.apache.org/jira/browse/LUCENE-3371
> : > https://issues.apache.org/jira/browse/LUCENE-3229
>
>
> -Hoss
> http://www.lucidworks.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to