Re: The idea to enhance MapReduce to resolve the skew problem

Jeff Zhang Thu, 04 Feb 2010 01:56:27 -0800

Hi,

Do you mean do resplitting and recombining in each mapper task ? I am sure
what the purpose, as my understanding, the Partitioner determine which
reducer the output of mapper task go. So I don't think you method can solve
the skew problem.



2010/2/4 易剑 <[email protected]>

> Currently, only map tasks are balanced, and reduce tasks possible are skew,
> the timeslice is also different, which lead the scheduler is not smart. I
> have an idea to improve it.
>
> We can break the output of map to N*M splits, N is the number of nodes, and
> M >=1，and regroup to new splits bycombining the smaller splits and
> resplitting the bigger splits, until the size of every splits is balanced
> with the specified value.
>
> There are three cases:
> 1. Too many values for a key
> 2. Too many keys hash to a partition
> 3. Every partition is balanced in the size
>
> If too many values for a key, adding a new MapReduce procedure is
> necessary.
> If too many keys hash to a partition, resplitting is necessary.
>
> If every splitting is balanced, we can consider a task (map or reduce) to a
> scheduler timeslice, the scheduler will be smart like OS's scheduler.
>



-- 
Best Regards

Jeff Zhang

Re: The idea to enhance MapReduce to resolve the skew problem

Reply via email to