Hey Li, Thanks - great answer, exactly touched on the points I was interested in.
One last Q - Once you did tweak it to work in a 'top K' way,what was performance impact like? I've written similar components in the past that iterate over top result set docs (on the order of 400-600 top results) and these would usually run in no more than 4-5ms. Is this close to numbers you're seeing for this component? Thanks, Chak On Tue, Sep 28, 2010 at 5:14 PM, Li Li <fancye...@gmail.com> wrote: > I think current implmetation is slow. because it do collapse in all > the hit docs. In my view, it will take more than 1s when using > collapse and only 200ms-300ms when not in our environment. So we > modify it as -- when user need top 100 docs, we collect top 200 docs > and do collapse within these 200 docs. Of course, user may not see so > much duplicated docs as before but I think it's not that important. > anyway, collapsing is not clustering. > > 2010/9/29 Kaktu Chakarabati <jimmoe...@gmail.com>: > > hey guys, > > Any word on this? has anyone did any benchmarking / used this in > > production-like environment? > > We are considering using this feature on a large scale for deduplication > and > > was wondering > > if anyone has some numbers before I go ahead and start my own series of > > tests... > > > > > > thanks, > > Chak > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >