Only w/ ScoreDocs we reuse the same instance. So I guess we'd like to do the same here.
Seems like providing a TopSpansCollector is what you want, only unlike TopFieldCollector which populates the fields post search, you'd like to do it during search. I've been typing and deleting suggestions for the past 5 minutes. I guess it's late for me, so I'll sleep on it. sorry :) Shai On Thu, Aug 6, 2009 at 11:39 PM, Grant Ingersoll <gsing...@apache.org>wrote: > > On Aug 6, 2009, at 4:25 PM, Shai Erera wrote: > > But still you might collect spans for docs unnecessarily during >> processing. If a doc is added to the PQ and later removed, then the spans >> collection was just a waste of time (unless the collection comes in free >> during query processing). >> > > sure, but that is just the nature of the PQ, things get moved off. We > collect ScoreDocs right now, too, that get removed, too. We presumably are > only storing a few more bytes: start (int), end (int) and payload (byte > array, presumably small). > > > >> Also, if you build a paging search UI, then as soon as the user clicks >> "Next" for the first time, you'll collect the Spans for the first 10 docs >> (10 is an example) unnecessarily, because they won't be used. >> > > Again, likewise for the ScoreDocs. > > >> I don't know if it makes sense, but how about if you execute the query and >> get the top docs. Then you get the range of docs you need (first 10, second >> 10). Then you sort the docs based on their appearance in the spans. Then >> iterate on spans to collect them. You can use just skipTo. You can then >> either sort back, or if you optimize it, just return the docs in the TopDocs >> in the order they appeared, but now w/ the spans. I'm sure you get the idea >> of what I propose, even though I use too many words to describe it :). >> > > Yes, this is what I do, but it involves jumping through hoops, etc. when it > seems like during scoring we already had the info. Again, I am likely > willing to trade off the memory and some extra garbage (but not much, I > suspect) for having to go through the Spans again. > > You can also somewhat optimize the iterate over scoredocs case by asking > whether the Spans.doc() is greater than the ScoreDoc.doc. If it is, then > you reset the Spans back to to the beginning and do a skipTo. Not sure if > this is faster than the sorting approach. > > -Grant > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > >