Hello, I'm currently working on Solr grouping in order to support reranking [1]. I've a working patch for non distribute search, and I'm now working on the distribute setting.
Looking at the code of distribute grouping (top-k groups, top-n documents for each group) search consists in: GROUPING_DISTRIBUTED_FIRST 1. given the grouping query, each shard will return the top-k groups 2. federator will merge the top-k groups and will produce the top-k groups for the query GROUPING_DISTRIBUTED_SECOND 1. given the top-k groups each shard will return its top-n documents for each group. 2. federator will then compute top-n documents for each group merging all the shards responses. GET_FIELDS as usual My plan was to change the collector in GROUPING_DISTRIBUTED_SECOND, and return the top documents for each group with a new score given by the function used to rerank (affecting maxScore for each group and then also the order of the groups). Looking at the code then I realized that TopGroups asserts that order of the groups is not changing, and I realized that indeed _ if the ranking function is the same, group order can't change after the first stage _. My question is: if the user is interested only in the top document for each group (i.e., the default: group.limit = 1) do we really need GROUPING_DISTRIBUTED_SECOND, or could we skip it? is there any reason to perform grouping distributed second in this case? or we could just return the top docid together with the topgroups in GROUPING_DISTRIBUTED_FIRST and then go directly to GET_FIELDS? Cheers, Diego [1] https://issues.apache.org/jira/browse/SOLR-8542