Hi TopDocCollector's (TDC) implementation of collect() seems a bit problematic to me.
if (reusableSD == null) { reusableSD = new ScoreDoc(doc, score); } else if (score >= reusableSD.score) { // reusableSD holds the last "rejected" entry, so, if // this new score is not better than that, there's no // need to try inserting it reusableSD.doc = doc; reusableSD.score = score; } else { return; } reusableSD = (ScoreDoc) hq.insertWithOverflow(reusableSD); The first 'if' is safe, as it assumes that if reusableSD is null, then there is more room in PQ. The second (else if) is a bit problematic. It assumes that if the given score is less than the latest item rejected from PQ, then it should not be added. The problem I see here is if someone extends TDC and provides his own PQ, which for example, sorts based on the lowest scoring documents, or just by document IDs. TDC has a protected constructor which allows that. In that case, comparing the current score to the latest rejected ScoreDoc is wrong. Also, PQ itself will check that (using lessThan) before attempting to call the insert itself. On the other hand, this check saves two method calls (PQ.insert and PQ.lessThan) in case someone wants to sort by top ranking documents, which is the default behavior. Of course we could say to that person who wants to sort by docIDs, to extend TDC and override collect() itself, but I would like to propose an alternative, which will allow extending TDC by providing a different PQ, while still using its collect() method. Introduce in TDC a private boolean which signals whether the default PQ is used or not. If it's not used, don't do the 'else if' at all. If it is used, then the 'else if' is safe. Then code could look like: if (reusableSD == null) { reusableSD = new ScoreDoc(doc, score); } else if (useDefaultPQ){ if (score >= reusableSD.score) { // reusableSD holds the last "rejected" entry, so, if // this new score is not better than that, there's no // need to try inserting it reusableSD.doc = doc; reusableSD.score = score; } else { return; } } else { reusableSD.doc = doc; reusableSD.score = score; } reusableSD = (ScoreDoc) hq.insertWithOverflow(reusableSD); I don't think that adding useDefaultPQ will have any performance implications, but it does add one extra boolean check. Therefore if you agree that this should be fixed in TDC, rather than forcing anyone who provides his own PQ to also override collect() (in the cases I've noted above), then I can submit a patch. Shai