Shai Erera <ser...@gmail.com> wrote: > As a side comment, why not add setNextReader to HitCollector and > then a getDocId(int doc) method which will do the doc + base > arithmetic?
One problem is this breaks back compatibility on any current subclasses of HitCollector. Another problem is: not all collectors would need to add the base on each doc. EG a collector that puts hits into separate pqueues per segment could defer the addition until the end when only the top results are pulled out of each pqueue. Also, I am concerned about the method call overhead. This is the absolute ultimate hot spot for Lucene and we should worry about causing even a single added instruction in this path. That said... I would like to [eventually] change the collection API along the lines of what Marvin proposed for "Matcher" in Lucy, here: http://markmail.org/message/jxshhiqr6wvq77xu Specifically, I think it should be the collector's job to ask for the score for this doc, rather than Lucene's job to pre-compute it, so that collectors that don't need the score won't waste CPU. EG, if you are sorting by field (and don't present the relevance score) you shouldn't compute it. Then, we could add other "somewhat expensive" things you might retrieve, such as a way to ask which terms participated in the match (discussed today on java-user), and/or all term positions that participated (discussed in LUCENE-1522). EG, a top doc collector could choose to call these methods only when the doc was competitive. > Anyway, I don't want to add topDocs and getTotalHits to > HitCollector, it will destroy its generic purpose. I agree. > An interface is also problematic, as it just means all of these > collectors have these methods declared, but they need to implement > them. An abstract class grants you w/ both. I'm confused on this objection -- only collectors that do let you ask for the top N set of docs would implement this interface? (Ie it'd only be the TopXXXCollector's that'd implement the interface). While interfaces clearly have the future problem of back-compatibility, this case may be simple enough to make an exception. > So it looks like HitCollector itself is "deprecated" as far as the > Lucene core code sees it. I think HitCollector has a purpose, which is to be the simplest way to make a custom collector. Ie I think it makes sense to offer a simple way and a high performance way. Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org