Re: filtering and chaining Collectors

Adrien Grand Thu, 16 Aug 2018 01:23:09 -0700

I think one reason that we don't want to encourage filtering at the
collector level is that it is much slower than filtering in the query. The
former needs to check hits one by one while the latter can use leap frog to
skip documents that don't match.


Le mer. 15 août 2018 à 23:27, Michael Sokolov <[email protected]> a écrit :

> Hmm the more I root around, the more crazy it seems to try to thread a
> return value through all the different places collect() gets called from.
> Somehow I thought it would just be one place in IndexSearcher somewhere.
>
> On Wed, Aug 15, 2018 at 5:18 PM Michael Sokolov <[email protected]>
> wrote:
>
> > We have MultiCollector to enable running multiple Collectors on the same
> > hits, in sequence for each hit. I think a nice extension would be to
> enable
> > filtering so that earlier collectors could reject a hit, preventing later
> > collectors from seeing it.  This way you could have a post-filter
> > implemented in one collector, and some other collection, like faceting,
> in
> > the next one, that wants to ignore hits that are filtered in this
> > post-filter.
> >
> > The implementation idea would be to return a "status" value from
> > LeafCollector.collect() indicating how to proceed. This could also
> > naturally be used for early termination (you could have status=TERMINATE
> |
> > SKIP | COLLECT, say).
> >
> > I was trying to undertsand why this wasn't done before  for early
> > termination since it seemed so natural to me, and thought - there must
> be a
> > reason. But I went and read through (skimmed really) the original
> > EarlyTerminatingCollector issue (
> > https://issues.apache.org/jira/browse/LUCENE-4858) and didn't see any
> > discussion of that.
> >
> > Am I missing something here?
> >
> > -Mike
> >
>

Re: filtering and chaining Collectors

Reply via email to