1) the changes to the number of map/reduce slots per task tracker is fixed
at task tracker start time, not at job start time
2) the rate of launching tasks is relatively slow through hadoop 0.19
3) the number of tasks for a job is determined by the number of input files
and the computed split size goal for the input format.



On Mon, Dec 21, 2009 at 11:00 AM, Todd Lipcon <[email protected]> wrote:

> Hi Starry,
>
> The "assignmultiple" feature in the Fair Scheduler fixes this issue after
> MAPREDUCE-706. This is not in any current releases, but we will be rolling
> it into our next major release of Cloudera's distribution.
>
> -Todd
>
> P.S. Please don't cross-post questions to 4 lists. It's not necessary and
> just makes the archives noisy. Thanks
>
> On Mon, Dec 21, 2009 at 8:13 AM, Starry SHI <[email protected]> wrote:
>
> > Hi guys,
> >
> > Thank you for your reply!
> >
> > I have already set the configurations, and my cluster's Map/Reduce
> capacity
> > is more than 2 already. Also, the data are large enough to be put in more
> > than 2 MapTasks. However, when I launch the job, it seems no matter how I
> > set the configuration, there is only 2 map tasks running in the first
> > heartbeat time. To give an example, I use 4 map tasks. I expect the
> launch
> > should be like this:
> >
> > task   |  offset to start time
> > --------------------------------
> > map1 |     0sec
> > map2 |     0~2sec
> > map3 |     0~2sec
> > map4 |     0~2sec
> >
> > this shows that the 4 maptasks are launched simultaneously.
> >
> > But the result turned out to be:
> >
> > task   |  offset to start time
> > --------------------------------
> > map1 |     0sec
> > map2 |     1sec
> > map3 |     5sec
> > map4 |     6sec
> >
> > This shows that in the first heartbeat(5sec), only two map tasks are
> > launched, the other 2 are waiting until the next heartbeat. I wonder why
> > the
> > four map tasks cannot be launched together?
> >
> > The 5sec delay will be amplified if the total number of machines
> increases,
> > and the execution time will be delayed significantly. I think if we can
> > manage to set the map tasks run as much as possible in one heartbeat and
> > takes all the available slots, there will be a great improvements in the
> > performance.
> >
> > I would like to hear your suggestions and opinions on this.
> >
> > Best regards,
> > Starry
> >
> > /* Tomorrow is another day. So is today. */
> >
> >
> > On Mon, Dec 21, 2009 at 14:41, Chandraprakash Bhagtani <
> > [email protected]
> > > wrote:
> >
> > > You can increase the map/reduce slots
> > > using "mapred.tasktracker.map(reduce).tasks.maximum"  property only
> > >
> > > there can be following cases
> > >
> > > 1. your changes are not taking effect. you need to restart the cluster
> > > after
> > > making changes in conf xml.
> > >    you can check your cluster (Map/Reduce) capacity at
> > > http://jobtracker-address:50030/
> > >
> > > 2. your data is not enough to create more than 2 map tasks. But in that
> > > case
> > > reducers should still be equal
> > >    to mapred.reduce.tasks
> > >
> > > On Mon, Dec 21, 2009 at 9:39 AM, Starry SHI <[email protected]>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > I am currently using hadoop 0.19.2 to run large data processing. But
> I
> > > > noticed when the job is launched, there are only two map/reduce tasks
> > > > running in the very beginning. after one heartbeat (5sec), another
> two
> > > > map/reduce task is started. I want to ask how I can increase the
> > > map/reduce
> > > > slots?
> > > >
> > > > In the configuration file, I have already set
> > > > "mapred.tasktracker.map(reduce).tasks.maximum" to 10, and
> > > > "mapred.map(reduce).tasks" to 10. But there are still only 2 are
> > > launched.
> > > >
> > > > Eager to hear from your solutions!
> > > >
> > > > Best regards,
> > > > Starry
> > > >
> > > > /* Tomorrow is another day. So is today. */
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks & Regards,
> > > Chandra Prakash Bhagtani,
> > > Impetus Infotech (india) Pvt Ltd.
> > >
> >
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Reply via email to