Doug, On Tuesday 06 April 2004 18:11, Doug Cutting wrote: > Paul Elschot wrote: > > A test of the ordered span query with three terms: > > w1 w2 w3 > > and slop 1 against document: > > w1 w3 w2 w3 > > fails. > > Thanks for catching this. It would be helpful if you could submit a > JUnit test which tests this case.
I'll try. > > The javadoc (1.4 rc3) of SpanNearQuery gives: > > Matches spans which are near one another. One can specify slop, the > > maximum number of intervening unmatched positions, as well as whether > > matches are required to be in-order. > > > > But the span search seems to scan the document from > > > > w1 w3 w2 > > > > to > > > > w3 w2 w3 > > > > instead of allowing for the slop to match w1 . w2 w3. > > I think this is indeed the problem. Currently it always increments the > earliest span. Rather I think it should increment the first span, still > within slop of the earliest span, that is out of order. So, in your Yes, when the current match length and slop still allow. > example, when the spans are [w1 w3 w2], it should increment w3, since > it's start is zero words after the end of w1 (slop is zero) but it is > out of order: w2 is required after w1. I think this rule generalizes to > larger queries. > > Does this sound right? If so, then I'll try to fix it. I may not get It sounds right, but I'm not certain whether it generalizes to larger queries. The question is: could incrementing the earliest span that is out of order, but within allowed the slop, cause the search window to miss the first ordered occurrence with the allowed slop at or after the beginning of the current search window? I can't answer that question in a few minutes, so I'd rather spend my time on programming the test case for now. (What was that joke again on a fool and wise men and questions?) > to it for a few weeks however, since I'm busy this week and on vacation > next week. > > > Anyway, does this mean that I should not use an ordered SpanNearQuery > > with some slop with more than 2 subqueries? > > Until we fix this, yes. Thanks for identifying this bug. It's easy to work around, one only needs to nest some ordered span queries with 2 subqueries each. This does not give exactly the same behaviour, but it's good enough in practice. > > I'm testing a parser for the span queries, so posting self contained > > test code would require some coding around that parser. > > Will you be able to contribute the parser? It would be good to have a > SpanQuery parser in Lucene, if it is general-purpose. I would like to contribute the parser, and I hope I will be allowed to do so. It is quite general, but not general purpose: the target audience is power users. It does not use an analyzer and there are is no default operator. When/if the time comes I'll ask here on how to contribute. > ... > Thanks again, My pleasure, have a good vacation. Paul. P.S. Only slightly off topic. Are you familiar with: http://citeseer.ist.psu.edu/457664.html Fast Algorithms for k-word Proximity Search (2001) Kunihiko SADAKANE . Hiroshi IMAI It's about finding minimal intervals of k terms with arbitrary order. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]