I have 12_000_000 documents, 6_500_000 groups With sort: It takes around 1 sec without grouping, 2 sec with grouping and 12 sec with setAllGroups(true) Without sort: It takes around 0.2 sec without grouping, 0.6 sec with grouping and 10 sec with setAllGroups(true)
Thank you, Erick, I will look into it пт, 9 окт. 2020 г. в 14:32, Erick Erickson <erickerick...@gmail.com>: > At the Solr level, CollapsingQParserPlugin see: > https://lucene.apache.org/solr/guide/8_6/collapse-and-expand-results.html > > You could perhaps steal some ideas from that if you > need this at the Lucene level. > > Best, > Erick > > > On Oct 9, 2020, at 7:25 AM, Diego Ceccarelli (BLOOMBERG/ LONDON) < > dceccarel...@bloomberg.net> wrote: > > > > Is the field that you are using to dedupe stored as a docvalue? > > > > From: java-user@lucene.apache.org At: 10/09/20 12:18:04To: > java-user@lucene.apache.org > > Subject: Deduplication of search result with custom with custom sort > > > > Hi, > > I need to deduplicate search results by specific field and I have no idea > > how to implement this properly. > > I have tried grouping with setGroupDocsLimit(1) and it gives me expected > > results, but has not very good performance. > > I think that I need something like DiversifiedTopDocsCollector, but > > suitable for collecting TopFieldDocs. > > Is there any possibility to achieve deduplication with existing lucene > > components, or do I need to implement my own > DiversifiedTopFieldsCollector? > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >