On Saturday 12 September 2009 14:40:28 Mark Miller wrote: > Michael McCandless wrote: > > OK thanks for the responses. This is indeed tricky stuff! > > > > On Sat, Sep 12, 2009 at 12:28 AM, Mark Miller <markrmil...@gmail.com> wrote: > > > > > >> They start at the left and march right - each Span always starting > >> after the last started, > >> > > > > That's not quite always true -- eg I got span 1-8, twice, once I added > > "b" as a clause to the SNQ. > > > Mmm - right - depends on how you look at it I think - it is less simple > with terms at multiple positions, in that now each Span doesn't start > in the *position* after the last - but if you line up the terms like you > did, its still the same - the first 1 - 8 starts at the first term at > pos 1, and > the next 1 to 8 starts at the seconds term at pos 1. One starts after > the other (though if you think Lucene positions, I realize they virtually > start at the same spot). > > > >> You might want exhaustive for highlighting as well - but its > >> different algorithms ... > >> > > > > Yeah, how we would represent spans for highlighting is tricky... we > > had discussed this ("how to represent spans for aggregate queries") > > recently, I think under LUCENE-1522. > > > > I think we'd have to return a tree structure, that mirrors the query's > > tree structure, to hold the spans, rather than try to enumerate > > ("denormalize") all possible expansions. Each leaf node would hold > > actual data (position, term, payload, etc.), and then the tree nodes > > would express how they are and/ord/near'd together. My app could then > > walk the tree to compute any combination I wanted. > > > > > >> In the end, I accepted my definition of works as - when I ask for > >> the payloads back, will I end up with a bag of all the payloads that > >> the Spans touched. I think you do. > >> > > > > Yeah I think you do, except each payload is only returned once. So > > it's only the first span that hits a payload that will return it. > > > > So it sounds like SNQ just isn't guaranteed to be exhaustive in how it > > enumerates the spans, eg I'll never see that 2nd occurrence of "k", > > nor its associated payload. > > > Not only not guaranteed, but its just not going to happen - its not > how spans match. If I say find n within 300 of m with the following: > > n m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m > m m m m m m m m m m m m > > Only the first m will match. It will start at the left, find the n, then > say great, an m within 300, this doc matches, we are done. There is > not another n to start on or finish on to the right. It doesn't then > touch the next 300 m's - just they way Doug implemented them from what I > can tell. Its only exhaustive from the > left - find m within 300 of n, order matters (m first) > > m m m m m m m m m m m m m m m m m m n > > This will be a bunch of spans - start at the left - the first m to n > matches, then the second m - n matches, then the third m to n matches, > and so on as we move right.
In the ordered case that last one should only match once, against the last m. Regards, Paul Elschot > > For now I'll just match this behavior ("can only load payload once") > > in all codecs in LUCENE-1458... the test passes again once I do that. > > > > > >> I meant, all those Spans came from one query - so you got your bag > >> of payloads right? If each Span was a separate entity, it would > >> obviously be way wrong - but from a single SpanQuery, at least you > >> got all the payloads in some form :) > >> > > > > Right, this is all one query... but the payload for the 2nd > > occurrence of "k" was never included in any span so I didn't get "all" > > payloads. > > > You got all the payloads the query matched - I think you need a > different query (or > we change the Spans algorithm completely) > > Maybe if/once we incorporate spans into Lucene's normal queries > > (optionally, so there's no performance hit if you don't ask for them) > > we can re-visit these issues. > > > > Mike > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > > > > > > -- > - Mark > > http://www.lucidimagination.com > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > >