ok I missed 1483 completely.

As a side comment, why not add setNextReader to HitCollector and then a
getDocId(int doc) method which will do the doc + base arithmetic? I think
it's very easy for someone to forget to add that (+ base) to doc. You could
then just change TopDocCollector to call getDocId() instead of duplicating
it into TopScoreDocCollector.

Isn't that something you'd want all HitCollector implementations to use? I
consider some extensions of HitCollector we have - we now will probably want
to change them to extend MultiReaderHitCollector, but we'll have to remember
to do that +base arithmatic everywhere, instead of calling getDocId(). I
understand that changing the call to getDocId is the same as adding "+
base", from an effort perspective, but I think it's better this way. It does
involve an additional method call, but I wonder how good compilers will
handle that.

Anyway, I don't want to add topDocs and getTotalHits to HitCollector, it
will destroy its generic purpose. An interface is also problematic, as it
just means all of these collectors have these methods declared, but they
need to implement them. An abstract class grants you w/ both.

So in case you agree that the logic of MultiReaderHitCollector can (and
should?) be in HitCollector, we can create an abstract class called
ScoringCollector (or if nobody objects TopDocsCollector) which will
implement these two methods.
In case you disagree, we can have that abstract class extend
MultiReaderHitCollector instead.

I'm in favor of the first option as at least as it looks in the code,
HitCollector is not extended by any class anymore, except TopDocCollector
which is marked as deprecated, and 3 anonymous implementations. So it looks
like HitCollector itself is "deprecated" as far as the Lucene core code sees
it.

What do you think?

Shai

On Mon, Mar 23, 2009 at 4:43 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> > If we're already creating a new TopScoreDocCollector (when was it
> > added?  I must have been dozing off while this happened...)
>
> This was LUCENE-1483.
>
> > How about if we introduce an abstract ScoringCollector (about the
> > name later) which implements topDocs() and getTotalHits() and there
> > will be several implementations of it, such as:
> > TopScoreDocCollector, which sorts the documents by their score, in
> > descending order only, TopFieldDocCollector - for sorting by fields,
> > and additional sort-by collectors.
>
> This sounds good... but the challenge is we also need to get both
> HitCollector and MultiReaderHitCollector in there.
>
> HitCollector is the simplest way to create a custom collector.
> MultiReaderHitCollector (added with LUCENE-1483) is the more
> performant way, since it lets your collector operate per-segment.  All
> non-deprecated core collectors in Lucene now subclass
> MultiReaderHitCollector.
>
> So would we make separate subclasses for each of them to add
> getTotalHits() / topDocs()?  EG TopDocsHitCollector and
> TopDocsMultiReaderHitCollector?  It's getting confusing.
>
> Or maybe we just add totalHits() and topDocs() to HitCollector even
> though for advanced case (non-top-N-collection) the methods would not
> be used?
>
> Or... maybe this is a time when an interface is the lesser evil: we
> could make a TopDocs interface that the necessary classes implement?
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

Reply via email to