SpanNearQuery: All matches within slop

Nathan Ashworth Wed, 27 Aug 2008 15:54:45 -0700

A more detailed explanation of the issue was posted about a year ago,
http://www.nabble.com/Possible-bug-in-SpanNearQuery-td10345758.html. I
couldn't find any signs of resolution.


As a brief summary, consider a field with these terms,

  "two one one two"

An ordered SpanNearQuery,

  spanNear([text:two, text:one], 1, true)

yields one span,

  two one [0,2]

An unordered SpanNearQuery,

  spanNear([text:two, text:one], 1, false)

yields three spans,

  two one [0,2]
  one one two [1,4]
  one two [2,4]

Neither query includes the span, "two one one" [0,3].

 --

This manifests itself as a problem in my work when I want to define an
inverted proximity operation. Say I want to find all instances of the word
"one" that don't follow the word "two" by some slop value. My initial
thought was that this query,

  spanNot(text:one, spanNear([text:two, text:one], 1, true))

would work. With the example string, I would have expected 0 spans returned.
However, that query returns a span, "one" [2,3]. I understand now why this
happens.

As a result of SpanNearQuery not matching all possible spans, the
SpanNotQuery operator cannot provide a logically inverted set of all
possible spans. Any compound SpanQuery that is dependent on that inverted
set being complete will be glaringly inaccurate.

I've looked at the code enough to know that know I would have to look at it
a lot longer in order to fully understand the algorithm. Is there any
general interest in modifying NearSpanOrdered/NearSpanUnordered to include
all possible spans? 

Thanks,

Nathan
-- 
View this message in context: 
http://www.nabble.com/SpanNearQuery%3A-All-matches-within-slop-tp19191359p19191359.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

SpanNearQuery: All matches within slop

Reply via email to