I think current implmetation is slow. because it do collapse in all the hit docs. In my view, it will take more than 1s when using collapse and only 200ms-300ms when not in our environment. So we modify it as -- when user need top 100 docs, we collect top 200 docs and do collapse within these 200 docs. Of course, user may not see so much duplicated docs as before but I think it's not that important. anyway, collapsing is not clustering.
2010/9/29 Kaktu Chakarabati <jimmoe...@gmail.com>: > hey guys, > Any word on this? has anyone did any benchmarking / used this in > production-like environment? > We are considering using this feature on a large scale for deduplication and > was wondering > if anyone has some numbers before I go ahead and start my own series of > tests... > > > thanks, > Chak > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org