Re: Re: Question about grouping in distribute mode

380382...@qq.com Thu, 06 Apr 2017 02:53:31 -0700

thank for your help
when i use compseId route ,i find the group.ngroup is a wrong number. I would 
like to know what implementation mechanism has led to this happening。why  we 
must use implict route when we want to use the group correctly

380382...@qq.com

From: Diego Ceccarelli (BLOOMBERG/ LONDON)
Date: 2017-04-06 17:16
To: 380382856
Subject: Re: Re: Question about grouping in distribute mode
Dear 380382856, 
I would be happy to help you if you can provide more informations, do you want 
to know why grouping implements a specific route strategy? My point is that 
usually grouping involves 3 communications between the federator and the 
shards, but in case of ngroup=1 it would be possible to obtain the same result 
with 2 communications. 

Can I please ask to post your question on the user solr mailing list [1]? in 
this way my answer will be useful to all solr users and people more expert than 
me can also answer (or correct me if I say something wrong :)) 

Have a good day! 
Diego

[1] http://lucene.apache.org/solr/community.html#mailing-lists-irc

From: 380382...@qq.com At: 04/06/17 08:38:20
To: DIEGO CECCARELLI (BLOOMBERG/ LONDON)
Subject: Re: Re: Question about grouping in distribute mode
hello can you help me?
There is a problem that has been bothering me.why solrcloud use group.ngroup 
shoud implements implict route stratege?
380382...@qq.com

From: Diego Ceccarelli (BLOOMBERG/ LONDON)
Date: 2017-03-30 22:09
To: dev
Subject: Re: Question about grouping in distribute mode
Yes, I agree. And if there are not problems with the logic it would improve the 
performance in both the cases.. 

From: dev@lucene.apache.org At: 03/30/17 14:59:31
To: dev@lucene.apache.org
Subject: Re: Question about grouping in distribute mode
This is also the case for non-distributed, isn’t it?  The lucene-level 
FirstPassGroupingCollector doesn’t actually record the docid of the top doc for 
each group at the moment, but I don’t think there’s any reason it couldn’t - 
it’s stored in the relevant FieldComparator.  And it would be a nice shortcut 
in GroupingSearch more generally.

Alan Woodward
www.flax.co.uk

On 30 Mar 2017, at 14:26, Diego Ceccarelli <diego.ceccare...@gmail.com> wrote:

Hello, I'm currently working on Solr grouping in order to support reranking 
[1].  
I've a working patch for non distribute search, and I'm now working on the 
distribute setting. 

Looking at the code of distribute grouping (top-k groups, top-n documents for 
each group) search consists in: 

GROUPING_DISTRIBUTED_FIRST 
1. given the grouping query, each shard will return the top-k groups
2. federator will merge the top-k groups and will produce the top-k groups for 
the query

GROUPING_DISTRIBUTED_SECOND
1. given the top-k groups  each shard will return its top-n documents for each 
group.
2. federator will then compute top-n documents for each group merging all the 
shards responses. 

GET_FIELDS
as usual 

My plan was to change the collector in GROUPING_DISTRIBUTED_SECOND, and return 
the top documents for each group with a new score given by the function used to 
rerank
(affecting maxScore for each group and then also the order of the groups).
Looking at the code then I realized that TopGroups asserts that order of the 
groups is not changing, 
and I realized that indeed _ if the ranking function is the same, group order 
can't change after the first stage _. 

My question is: if the user is interested only in the top document for each 
group (i.e., the default: group.limit = 1) do we really need 
GROUPING_DISTRIBUTED_SECOND, or could we skip it? 
is there any reason to perform grouping distributed second in this case? or we 
could just return the top docid together with the topgroups in 
GROUPING_DISTRIBUTED_FIRST and then go directly to GET_FIELDS? 

Cheers,
Diego

[1] https://issues.apache.org/jira/browse/SOLR-8542

Re: Re: Question about grouping in distribute mode

Reply via email to