Under LUCENE-1458, I'm hitting a curious test failure in TestPositionsIncrement.testPayloadsPos0. The failure happens because the codec I'm testing (pulsing codec) allows you to retrieve the same payload more than once if the term was pulsed (inlined into terms dict), whereas w/ trunk you can only retrieve the payload once.
But in debugging the failure, I'm struggling with what the correct behavior of SpanNearQuery really should be. The test creates a single doc with one analyzed field, with these single letter position:tokens: 0:a 1:a 1:b 2:c 2:d 3:e 3:a 4:f 4:g 5:h 5:i 6:j 6:a 7:b 7:k 8:k every token has a payload. Then it makes: SpanNearQuery SpanTermQuery term=a SpanTermQuery term=k Term "a" occurs four times (positions 0, 1, 3, 6) and "k" occurs 2 times (positions 7, 8). My first question is: what spans is SpanNearQuery supposed to enumerate? Right now trunk does these four: span 0 to 8 span 1 to 8 span 3 to 8 span 6 to 8 which represents position 7 of "k" mated with all positions of "a". (remember end is 1+, so "k"'s position 7 turned into 8). How come the position 8 occurrence of "k" was not included in any spans? My second question is: when you call getPayload() on each span, what should you get? Right now trunk does this: span 0 to 8 payload: pos: 0 payload: pos: 7 span 1 to 8 payload: pos: 0 span 3 to 8 payload: pos: 3 span 6 to 8 payload: pos: 6 The first span properly includes the payload for "a" (pos: 0) and for "k" (pos: 7), but the the subsequent three do not include the payload for "k". Shouldn't you get all payloads associated w/ the span? Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org