Hey Li,
Thanks - great answer, exactly touched on the points I was interested in.

One last Q  - Once you did tweak it to work in a 'top K' way,what was
performance impact like?
I've written similar components in the past that iterate over top result set
docs (on the order of 400-600 top results)
and these would usually run in no more than 4-5ms. Is this close to numbers
you're seeing for this component?

Thanks,
Chak

On Tue, Sep 28, 2010 at 5:14 PM, Li Li <fancye...@gmail.com> wrote:

> I think current implmetation is slow. because it do collapse in all
> the hit docs. In my view, it will take more than 1s when using
> collapse and only 200ms-300ms when not in our environment. So we
> modify it as -- when user need top 100 docs, we collect top 200 docs
> and do collapse within these 200 docs. Of course, user may not see so
> much duplicated docs as before but I think it's not that important.
> anyway, collapsing is not clustering.
>
> 2010/9/29 Kaktu Chakarabati <jimmoe...@gmail.com>:
> > hey guys,
> > Any word on this? has anyone did any benchmarking / used this in
> > production-like environment?
> > We are considering using this feature on a large scale for deduplication
> and
> > was wondering
> > if anyone has some numbers before I go ahead and start my own series of
> > tests...
> >
> >
> > thanks,
> > Chak
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Reply via email to