Hi Starry,

The "assignmultiple" feature in the Fair Scheduler fixes this issue after
MAPREDUCE-706. This is not in any current releases, but we will be rolling
it into our next major release of Cloudera's distribution.

-Todd

P.S. Please don't cross-post questions to 4 lists. It's not necessary and
just makes the archives noisy. Thanks

On Mon, Dec 21, 2009 at 8:13 AM, Starry SHI <[email protected]> wrote:

> Hi guys,
>
> Thank you for your reply!
>
> I have already set the configurations, and my cluster's Map/Reduce capacity
> is more than 2 already. Also, the data are large enough to be put in more
> than 2 MapTasks. However, when I launch the job, it seems no matter how I
> set the configuration, there is only 2 map tasks running in the first
> heartbeat time. To give an example, I use 4 map tasks. I expect the launch
> should be like this:
>
> task   |  offset to start time
> --------------------------------
> map1 |     0sec
> map2 |     0~2sec
> map3 |     0~2sec
> map4 |     0~2sec
>
> this shows that the 4 maptasks are launched simultaneously.
>
> But the result turned out to be:
>
> task   |  offset to start time
> --------------------------------
> map1 |     0sec
> map2 |     1sec
> map3 |     5sec
> map4 |     6sec
>
> This shows that in the first heartbeat(5sec), only two map tasks are
> launched, the other 2 are waiting until the next heartbeat. I wonder why
> the
> four map tasks cannot be launched together?
>
> The 5sec delay will be amplified if the total number of machines increases,
> and the execution time will be delayed significantly. I think if we can
> manage to set the map tasks run as much as possible in one heartbeat and
> takes all the available slots, there will be a great improvements in the
> performance.
>
> I would like to hear your suggestions and opinions on this.
>
> Best regards,
> Starry
>
> /* Tomorrow is another day. So is today. */
>
>
> On Mon, Dec 21, 2009 at 14:41, Chandraprakash Bhagtani <
> [email protected]
> > wrote:
>
> > You can increase the map/reduce slots
> > using "mapred.tasktracker.map(reduce).tasks.maximum"  property only
> >
> > there can be following cases
> >
> > 1. your changes are not taking effect. you need to restart the cluster
> > after
> > making changes in conf xml.
> >    you can check your cluster (Map/Reduce) capacity at
> > http://jobtracker-address:50030/
> >
> > 2. your data is not enough to create more than 2 map tasks. But in that
> > case
> > reducers should still be equal
> >    to mapred.reduce.tasks
> >
> > On Mon, Dec 21, 2009 at 9:39 AM, Starry SHI <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > I am currently using hadoop 0.19.2 to run large data processing. But I
> > > noticed when the job is launched, there are only two map/reduce tasks
> > > running in the very beginning. after one heartbeat (5sec), another two
> > > map/reduce task is started. I want to ask how I can increase the
> > map/reduce
> > > slots?
> > >
> > > In the configuration file, I have already set
> > > "mapred.tasktracker.map(reduce).tasks.maximum" to 10, and
> > > "mapred.map(reduce).tasks" to 10. But there are still only 2 are
> > launched.
> > >
> > > Eager to hear from your solutions!
> > >
> > > Best regards,
> > > Starry
> > >
> > > /* Tomorrow is another day. So is today. */
> > >
> >
> >
> >
> > --
> > Thanks & Regards,
> > Chandra Prakash Bhagtani,
> > Impetus Infotech (india) Pvt Ltd.
> >
>

Reply via email to