BTW Mike, I noticed that TopFieldDocCollector extends TopScoreDocCollector.
That is a problem if we want to make TSDC final. Now, TFDC is marked
deprecated, so it will be removed in the future.
I think an easy fix is just to have TFDC extend TopResultsCollector, right?

On Thu, Mar 26, 2009 at 2:52 PM, Shai Erera <ser...@gmail.com> wrote:

> I may have lost the context here... but here's what I thought we were
>> talking about.
>>
>> If we choose the interface option (adding a "ProvidesTopDocsResults"
>> (say) interface), then you would create method
>> renderResults(ProvidesTopDocResults).
>>
>> Then, any collector implementing that interface could be safely passed
>> in, as the upcast is done at compile time not runtime.
>>
>
> Consider this code snippet:
>
> HitCollector hc = condition? new TopDocCollector() :
> TopFieldDocCollector();
> searcher.search(hc);
>
> The problem is that I need a base class for both collectors. If I use the
> interface ProvidesTopDocsResults, then I cannot pass it to searcher, w/o
> casting to HitCollector. If I use HitCollector, then I need to cast it
> before passing it into rederResults(). Only when both class have the same
> base class which is also a HitCollector, I don't need any casting. I.e.,
> after I submit a patch that develops what we've agreed on, the code can look
> like this:
>
> TopResultsCollector trc = condition ? new TopScoreDocCollector() : new
> TopFieldCollector();
> searcher.search(trc);
> renderResults(trc);
>
> Here I can pass 'trc' to both methods since it both a HitCollector and a
> TopResultsCollector. That's what I was missing in your proposal.
>
> Shai do you want to take a first cut at making a patch?  Can you open
>> an issue?  Thanks.
>
>
> I can certainly do that. I think the summary of the steps make sense. I'll
> check if TopScoreDocCollector and TopFieldCollector can also extend that
> "you provide your own pqueue and we collect top N according to it"
> collector, passing a null PQ and extending topDocs().
>
> I also would like to propose an additional method to topDocs(), topDocs(int
> start, int howMany) which will be more efficient to call in case of paging
> through results. The reason is that topDocs() pops everything from the PQ,
> then allocates a ScoreDoc[] array of size of number of results and returns a
> TopDocs object. You can then choose just the ones you need.
> On the other hand, topDocs(start, howMany) would allocate exactly the size
> of array needed. E.g., in case someone pages through results 91-100, you
> allocate an array of size 10, instead of 100.
> It is not a huge improvement, but it does save some allocations, as well as
> it's a convenient method.
>
> BTW, I like the name ResultsCollector, as it's just like HitCollector, but
> does not commit too much to "hits" .. i.e., facets aren't hits ... I think?
>
> Shai
>
> On Thu, Mar 26, 2009 at 12:55 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Shai Erera <ser...@gmail.com> wrote:
>>
>> >> The difference is for the new code, it's an upcast, which catches any
>> >> errors at compile time, not run time.  The compiler determines that
>> >> the class implements the required interface.
>> >
>> > I still don't understand how a compiler can detect at compilation time
>> that
>> > a HitCollector instance that is used in method A, and is casted to a
>> > TopDocsOutput instance by calling method B (from A) is actually ok ... I
>> may
>> > be missing some Java information here, but I simply don't see how that
>> > happens at compilation time instead of at runtime ...
>>
>> I may have lost the context here... but here's what I thought we were
>> talking about.
>>
>> If we choose the interface option (adding a "ProvidesTopDocsResults"
>> (say) interface), then you would create method
>> renderResults(ProvidesTopDocResults).
>>
>> Then, any collector implementing that interface could be safely passed
>> in, as the upcast is done at compile time not runtime.
>>
>> > So in fact the internal Lucene code expects only MRHC from a certain
>> point,
>> > and so even if I wrote a HC and passed it on Searcher, it's still
>> converted
>> > to MRHC, with an empty setNextReader() method implementation. That's why
>> I
>> > meant that HC is already deprecated, whether it's marked as deprecated
>> or
>> > not.
>>
>> The setNextReader() impl is not empty; it does the re-basing of docID
>> on behalf of the HC.
>>
>> > What you say about deprecating HC to me is unnecessary. Simply pull
>> > setNextReader up with an empty implementation, get rid of all the
>> > instanceof, casting and wrapping code and you're fine. Nothing is
>> broken.
>> > All works well and better (instanceof, casting and wrapping have their
>> > overheads).
>> > Isn't that right?
>>
>> I think we need to deprecate HC, in favor of MRHC (or if we can think
>> of a better name... ResultCollector?).
>>
>> > Regarding interfaces .. I don't think they're that bad. Perhaps a
>> different
>> > viewing angle might help. Lucene processes a query and fires events.
>> Today
>> > it fires an event every time a doc's score has been computed, and
>> recently
>> > every time it starts to process a different reader. HitCollector is a
>> > listener implementation on the "doc-score event", while MRHC is a
>> listener
>> > on both.
>> > To me, interfaces play just nicely here. Assume that you have the
>> following
>> > interfaces:
>> > - DocScoreEvent
>> > - ChangeReaderEvent
>> > - EndProcessingEvent (thrown when the query has finished processing -
>> > perhaps to aid collectors to free resources)
>> > - <any other events you foresee?>
>> > The Lucene code receives a HitCollector which listens on all events. In
>> the
>> > future, Lucene might throw other events, but still get a HitCollector.
>> Those
>> > methods will check for instanceof, and you as a user will know that if
>> you
>> > want to catch those events, you pass in a collector implementation which
>> > does. Those events cannot of course be main-stream events (like
>> > DocScoreEvent), but new ones, perhaps even experimental.
>> > Since HitCollector is a concrete class, we can always add new interfaces
>> to
>> > it in the future with empty implementations?
>>
>> I agree interfaces clearly have useful properties, but their achilles
>> heel for Lucene in all but the most trivial needs is the
>> non-back-compatibility when you want to add a method to the interface.
>> There have been a number of java-dev discussons on this problem.
>>
>> So, I think something like this:
>>
>>  * Deprecate HitCollector in favor of MultiReaderHitCollector (any
>>    better name here?) abstract class.  If you want to make a fully
>>    custom HitCollector, you subclass this calss.
>>
>>    Let's change MRHC's collect to take only an [unbased] docID, and
>>    expose a setScorer(Scorer) method.  Then if the collector needs
>>    score it can call Scorer.score().
>>
>>  * Subclass that to create an abstract "tracks top N results
>>    collector" (TopDocsCollector?  TopHitsCollector?)
>>
>>  * Subclass TopDocsCollector to a final, fast "top N sorted by score"
>>    collector (exists already: TopScoreDocCollector)
>>
>>  * Subclass TopDocsCollector to a final, fast "top N sorted by field"
>>    collector (exists already: TopFieldCollector)
>>
>>  * Subclass TopDocsCollector to a "you provide your own pqueue and we
>>    collect top N according to it" collector (does not yet exist --
>>    name?).  This is the way forward for existing subclasses of
>>    TopDocCollector.
>>
>> Shai do you want to take a first cut at making a patch?  Can you open
>> an issue?  Thanks.
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>
>>
>

Reply via email to