Re: The idea to enhance MapReduce to resolve the skew problem

易剑 Thu, 04 Feb 2010 05:10:05 -0800

Hi,

There are two purpose:
1.Load balance both map and reduce task to solve the skew problem
2.By controling the scale of a task, a task can be regarded as a timeslice
of scheduling


The first is the precondition of the second.
How to solve the skew problem? I'll describe it detailed before long. I
think it is feasible.


2010/2/4 Jeff Zhang <[email protected]>

> Hi,
>
> Do you mean do resplitting and recombining in each mapper task ? I am sure
> what the purpose, as my understanding, the Partitioner determine which
> reducer the output of mapper task go. So I don't think you method can solve
> the skew problem.
>
>
> 2010/2/4 易剑 <[email protected]>
>
> > Currently, only map tasks are balanced, and reduce tasks possible are
> skew,
> > the timeslice is also different, which lead the scheduler is not smart. I
> > have an idea to improve it.
> >
> > We can break the output of map to N*M splits, N is the number of nodes,
> and
> > M >=1，and regroup to new splits bycombining the smaller splits and
> > resplitting the bigger splits, until the size of every splits is balanced
> > with the specified value.
> >
> > There are three cases:
> > 1. Too many values for a key
> > 2. Too many keys hash to a partition
> > 3. Every partition is balanced in the size
> >
> > If too many values for a key, adding a new MapReduce procedure is
> > necessary.
> > If too many keys hash to a partition, resplitting is necessary.
> >
> > If every splitting is balanced, we can consider a task (map or reduce) to
> a
> > scheduler timeslice, the scheduler will be smart like OS's scheduler.
> >
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 
Hadoop Technology Forum
http://bbs.hadoopor.com
http://www.hadoopor.com
http://forum.hadoopor.com

Re: The idea to enhance MapReduce to resolve the skew problem

Reply via email to