I'd have to dig in to be of much help. Hard to remember this stuff. 0:a 1:a 1:b 2:c 2:d 3:e 3:a 4:f 4:g 5:h 5:i 6:j 6:a 7:b 7:k 8:k
span 0 to 8 span 1 to 8 span 3 to 8 span 6 to 8 I think those are the right 4. You start on the left and work right. Spans always start after the last one started. So first you would find: 0 to 8. After 0, 1 to 8. After 1, 3 to 8, and after 3, 6 to 8. That makes sense. You never see 9 because the 8 comes first and you can end as many times on a pos as you want - but you dont ever start a span at the same pos. So I think this is right. The second question I am less sure about without looking at code. I think its because each payload can only be loaded once. So the first time you hit 0 to 8, you get both payloads - but every other span that hits 8, that payload was already loaded ? So you get all of the payloads you should, your just not duplicates in each span. I'd have to think harder about it - but overall it appears right ... ? All the Spans are subspans of a larger Span right? - Mark Michael McCandless wrote: > Under LUCENE-1458, I'm hitting a curious test failure in > TestPositionsIncrement.testPayloadsPos0. The failure happens because > the codec I'm testing (pulsing codec) allows you to retrieve the same > payload more than once if the term was pulsed (inlined into terms > dict), whereas w/ trunk you can only retrieve the payload once. > > But in debugging the failure, I'm struggling with what the correct > behavior of SpanNearQuery really should be. > > The test creates a single doc with one analyzed field, with these > single letter position:tokens: > > 0:a 1:a 1:b 2:c 2:d 3:e 3:a 4:f 4:g 5:h 5:i 6:j 6:a 7:b 7:k 8:k > > every token has a payload. > > Then it makes: > > SpanNearQuery > SpanTermQuery term=a > SpanTermQuery term=k > > Term "a" occurs four times (positions 0, 1, 3, 6) and "k" occurs 2 > times (positions 7, 8). > > My first question is: what spans is SpanNearQuery supposed to > enumerate? Right now trunk does these four: > > span 0 to 8 > span 1 to 8 > span 3 to 8 > span 6 to 8 > > which represents position 7 of "k" mated with all positions of "a". > (remember end is 1+, so "k"'s position 7 turned into 8). How come the > position 8 occurrence of "k" was not included in any spans? > > My second question is: when you call getPayload() on each span, what > should you get? Right now trunk does this: > > span 0 to 8 > payload: pos: 0 > payload: pos: 7 > span 1 to 8 > payload: pos: 0 > span 3 to 8 > payload: pos: 3 > span 6 to 8 > payload: pos: 6 > > The first span properly includes the payload for "a" (pos: 0) and for > "k" (pos: 7), but the the subsequent three do not include the payload > for "k". Shouldn't you get all payloads associated w/ the span? > > Mike > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > -- - Mark http://www.lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org