Michael McCandless wrote: > Thanks Mark! -- comments below: > > On Fri, Sep 11, 2009 at 3:34 PM, Mark Miller <markrmil...@gmail.com> wrote: > > >> I'd have to dig in to be of much help. Hard to remember this stuff. >> >> 0:a 1:a 1:b 2:c 2:d 3:e 3:a 4:f 4:g 5:h 5:i 6:j 6:a 7:b 7:k 8:k >> >> span 0 to 8 >> span 1 to 8 >> span 3 to 8 >> span 6 to 8 >> >> I think those are the right 4. You start on the left and work >> right. Spans always start after the last one started. >> > > OK, so SpanNearQuery always takes its left-most clause, releases a > span, and then advances it? What if there is a tie for two left-most > clauses? > > Eg if I had included "b" as a clause, here, it'd tie with "a" at > position 1 -- hmm, I just tested this: you get "span 1 to 8" twice: > > span 0 to 8 > payload: pos: 7 > payload: pos: 1 > payload: pos: 0 > span 1 to 8 > payload: pos: 0 > span 1 to 8 > payload: pos: 3 > span 3 to 8 > payload: pos: 6 > span 6 to 8 > payload: pos: 6 > > Also, the payloads sort of shifted down (eg "pos: 3" now shows up in > the "span 1 to 8" but before showed up in "span 3 to 8"), and "pos: 1" > (for b) was added under "span 0 to 8". > > (NOTE: confusingly, the "payload: pos: N" is off by one, in this test, > ie the "real" position is N+1). > > >> So first you would find: 0 to 8. After 0, 1 to 8. >> After 1, 3 to 8, and after 3, 6 to 8. That makes sense. >> You never see 9 because the 8 comes first and you can >> end as many times on a pos as you want - but you dont >> ever start a span at the same pos. So I think this is right. >> > > I think (if I were using SpanNearQuery) I'd want it to somehow include > 9, but I'm not quite sure how. This test sets slop to 30, so maybe > I'd want to see 0-9, 1-9, 3-9, 6-9? Ie the "maximal" spans possible. > EG my app will never see "k"'s payload from its occurrence at position > 8. > You might want it, but thats not how Spans currently works - they are not exhaustive. They start at the left and march right - each Span always starting after the last started, but ending at the closest match. Its just how the query works, and so when payloads was grafted on ... they are made to match documents quickly - not enumerate all matches in a document (I guess).
You might want exhaustive for highlighting as well - but its different algorithms ... > >> The second question I am less sure about without looking at code. >> I think its because each payload can only be loaded once. So the first >> time you hit 0 to 8, you get both payloads - but every other span that >> hits 8, that payload was already loaded ? So you get all of the payloads >> you should, your just not duplicates in each span. I'd have to think >> harder about it - but overall it appears right ... ? >> > > Yeah that is the reason why you only see each payload once, but I'm > not sure that's "right". I guess an app can always store away each > payload and pull it later, but eg it the app wants to score each span > using the payloads from all occurrences of clauses within it, you > can't trust getPayloads for that. > Fair enough - my idea of what appears right is tainted - I finished getting NearSpansOrdered to work with payloads and I've fixed some bugs - but I've never considered how it *should* work - I've just cursed and moved on trying to get what we have to work. In the end, I accepted my definition of works as - when I ask for the payloads back, will I end up with a bag of all the payloads that the Spans touched. I think you do. If each sub Span duplicated payloads, they might be right for some apps and it might be a pain for others right? You can't count on the order of the payloads or anything I think (been a while) - so its just like getting a bag back of those that matched. Anyway - I'm not happy with a few things, but it was fairly hard just getting things to work at this level. I'd love for NearSpansOrdered to actually lazy load the payloads for one. > >> All the Spans are subspans of a larger Span right? >> Sorry ;) I'm practicing with my chaotic brain so that one day I may actually be half way clear. I meant, all those Spans came from one query - so you got your bag of payloads right? If each Span was a separate entity, it would obviously be way wrong - but from a single SpanQuery, at least you got all the payloads in some form :) I'd love to be able to give some more intelligent responses here, but I'd have to dig back into the code again first. Spans were hard enough to deal with without adding these payloads to the mix :) > > Not sure what you mean here? > > Mike > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > -- - Mark http://www.lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org