A test of the ordered span query with three terms:
w1 w2 w3
and slop 1 against document:
w1 w3 w2 w3
fails.
Thanks for catching this. It would be helpful if you could submit a JUnit test which tests this case.
The javadoc (1.4 rc3) of SpanNearQuery gives:
Matches spans which are near one another. One can specify slop, the maximum
number of intervening unmatched positions, as well as whether matches are
required to be in-order.
But the span search seems to scan the document from
w1 w3 w2
to
w3 w2 w3
instead of allowing for the slop to match w1 . w2 w3.
I think this is indeed the problem. Currently it always increments the earliest span. Rather I think it should increment the first span, still within slop of the earliest span, that is out of order. So, in your example, when the spans are [w1 w3 w2], it should increment w3, since it's start is zero words after the end of w1 (slop is zero) but it is out of order: w2 is required after w1. I think this rule generalizes to larger queries.
Does this sound right? If so, then I'll try to fix it. I may not get to it for a few weeks however, since I'm busy this week and on vacation next week.
Anyway, does this mean that I should not use an ordered SpanNearQuery with some slop with more than 2 subqueries?
Until we fix this, yes. Thanks for identifying this bug.
I'm testing a parser for the span queries, so posting self contained test code would require some coding around that parser.
Will you be able to contribute the parser? It would be good to have a SpanQuery parser in Lucene, if it is general-purpose.
I wouldn't mind doing that, but it would be superfluous if this is the intended behaviour.
It should be fairly simple to code a standalone test case, no?
Thanks again,
Doug
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]