This is also the case for non-distributed, isn’t it? The lucene-level FirstPassGroupingCollector doesn’t actually record the docid of the top doc for each group at the moment, but I don’t think there’s any reason it couldn’t - it’s stored in the relevant FieldComparator. And it would be a nice shortcut in GroupingSearch more generally.
Alan Woodward www.flax.co.uk > On 30 Mar 2017, at 14:26, Diego Ceccarelli <diego.ceccare...@gmail.com> wrote: > > Hello, I'm currently working on Solr grouping in order to support reranking > [1]. > I've a working patch for non distribute search, and I'm now working on the > distribute setting. > > Looking at the code of distribute grouping (top-k groups, top-n documents for > each group) search consists in: > > GROUPING_DISTRIBUTED_FIRST > 1. given the grouping query, each shard will return the top-k groups > 2. federator will merge the top-k groups and will produce the top-k groups > for the query > > GROUPING_DISTRIBUTED_SECOND > 1. given the top-k groups each shard will return its top-n documents for > each group. > 2. federator will then compute top-n documents for each group merging all the > shards responses. > > GET_FIELDS > as usual > > My plan was to change the collector in GROUPING_DISTRIBUTED_SECOND, and > return > the top documents for each group with a new score given by the function used > to rerank > (affecting maxScore for each group and then also the order of the groups). > Looking at the code then I realized that TopGroups asserts that order of the > groups is not changing, > and I realized that indeed _ if the ranking function is the same, group order > can't change after the first stage _. > > My question is: if the user is interested only in the top document for each > group (i.e., the default: group.limit = 1) do we really need > GROUPING_DISTRIBUTED_SECOND, or could we skip it? > is there any reason to perform grouping distributed second in this case? or > we could just return the top docid together with the topgroups in > GROUPING_DISTRIBUTED_FIRST and then go directly to GET_FIELDS? > > Cheers, > Diego > > [1] https://issues.apache.org/jira/browse/SOLR-8542 > <https://issues.apache.org/jira/browse/SOLR-8542> >