I think current implmetation is slow. because it do collapse in all
the hit docs. In my view, it will take more than 1s when using
collapse and only 200ms-300ms when not in our environment. So we
modify it as -- when user need top 100 docs, we collect top 200 docs
and do collapse within these 200 docs. Of course, user may not see so
much duplicated docs as before but I think it's not that important.
anyway, collapsing is not clustering.

2010/9/29 Kaktu Chakarabati <jimmoe...@gmail.com>:
> hey guys,
> Any word on this? has anyone did any benchmarking / used this in
> production-like environment?
> We are considering using this feature on a large scale for deduplication and
> was wondering
> if anyone has some numbers before I go ahead and start my own series of
> tests...
>
>
> thanks,
> Chak
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to