On Tue, Mar 24, 2009 at 02:47:07PM +0200, Shai Erera wrote: > I agree about the unnecessary method call - we should make a collector's > implementation as efficient as possible.
Maybe it makes sense to just bite the bullet and duplicate the unrolled code? There's precedent: ScorerDocQueue is not a subclass of PriorityQueue. > But what about cases like collectors chaining, extensions and running w/ > several collectors? If each collector will need to request for the > document's score, it might be computed over and over again. Consider a case > for example, of a TopScoreDocCollector, running together w/ another > collector that extracts information about the document, and uses the score > to compute something (easy to write a collector which delegates the > collect() call to each of them). Today, I could just call collect(doc, > score) on each collector. But in the proposed way, I'd call collect(doc) and > then each of them will need to request the score. In such a case, perhaps it would be possible to supply a trivial Scorer wrapper subclass which caches a score. Then you still have the overhead of the method calls, but not the overhead of calculating the score. That's not ideal, but I think the case of matching-without-scoring is more important to optimize for. > Perhaps we can introduce a collect(doc) on HitCollector which does not > accept score, but keep the other collect? I am not sure if that's any > better, because then the Lucene search code would need to decide to which > collect method to call ... Also, passing arguments is dirt cheap. A HitCollector that only cares about adding doc nums to a BitVector can just ignore the second argument. > Regarding the TopDocs interface (we should have a better name as TopDocs is > already taken), "Winners". Marvin Humphrey --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org