Re: Question about grouping in distribute mode

Alan Woodward Thu, 30 Mar 2017 06:59:35 -0700

This is also the case for non-distributed, isn’t it?  The lucene-level 
FirstPassGroupingCollector doesn’t actually record the docid of the top doc for 
each group at the moment, but I don’t think there’s any reason it couldn’t - 
it’s stored in the relevant FieldComparator.  And it would be a nice shortcut 
in GroupingSearch more generally.


Alan Woodward
www.flax.co.uk


> On 30 Mar 2017, at 14:26, Diego Ceccarelli <diego.ceccare...@gmail.com> wrote:
> 
> Hello, I'm currently working on Solr grouping in order to support reranking 
> [1].  
> I've a working patch for non distribute search, and I'm now working on the 
> distribute setting. 
> 
> Looking at the code of distribute grouping (top-k groups, top-n documents for 
> each group) search consists in: 
> 
> GROUPING_DISTRIBUTED_FIRST 
> 1. given the grouping query, each shard will return the top-k groups
> 2. federator will merge the top-k groups and will produce the top-k groups 
> for the query
> 
> GROUPING_DISTRIBUTED_SECOND
> 1. given the top-k groups  each shard will return its top-n documents for 
> each group.
> 2. federator will then compute top-n documents for each group merging all the 
> shards responses. 
> 
> GET_FIELDS
> as usual 
> 
> My plan was to change the collector in GROUPING_DISTRIBUTED_SECOND, and 
> return 
> the top documents for each group with a new score given by the function used 
> to rerank
> (affecting maxScore for each group and then also the order of the groups).
> Looking at the code then I realized that TopGroups asserts that order of the 
> groups is not changing, 
> and I realized that indeed _ if the ranking function is the same, group order 
> can't change after the first stage _. 
> 
> My question is: if the user is interested only in the top document for each 
> group (i.e., the default: group.limit = 1) do we really need 
> GROUPING_DISTRIBUTED_SECOND, or could we skip it? 
> is there any reason to perform grouping distributed second in this case? or 
> we could just return the top docid together with the topgroups in 
> GROUPING_DISTRIBUTED_FIRST and then go directly to GET_FIELDS? 
> 
> Cheers,
> Diego
> 
> [1] https://issues.apache.org/jira/browse/SOLR-8542 
> <https://issues.apache.org/jira/browse/SOLR-8542>
>

Re: Question about grouping in distribute mode

Reply via email to