Why do you need to know the group size?
Did I miss a transition in exactly what you are talking about? On 10/11/07 2:57 PM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote: > Yeah - I am doing it with two MR jobs right now. > > ... > > (One of the issues is that the optimal implementation requires > anticipating the group size. Easy to do by custom code, hard to do > automatically .. (would have to maintain approximate counts of distinct > values by each dimension))
