Hi Starry, The "assignmultiple" feature in the Fair Scheduler fixes this issue after MAPREDUCE-706. This is not in any current releases, but we will be rolling it into our next major release of Cloudera's distribution.
-Todd P.S. Please don't cross-post questions to 4 lists. It's not necessary and just makes the archives noisy. Thanks On Mon, Dec 21, 2009 at 8:13 AM, Starry SHI <[email protected]> wrote: > Hi guys, > > Thank you for your reply! > > I have already set the configurations, and my cluster's Map/Reduce capacity > is more than 2 already. Also, the data are large enough to be put in more > than 2 MapTasks. However, when I launch the job, it seems no matter how I > set the configuration, there is only 2 map tasks running in the first > heartbeat time. To give an example, I use 4 map tasks. I expect the launch > should be like this: > > task | offset to start time > -------------------------------- > map1 | 0sec > map2 | 0~2sec > map3 | 0~2sec > map4 | 0~2sec > > this shows that the 4 maptasks are launched simultaneously. > > But the result turned out to be: > > task | offset to start time > -------------------------------- > map1 | 0sec > map2 | 1sec > map3 | 5sec > map4 | 6sec > > This shows that in the first heartbeat(5sec), only two map tasks are > launched, the other 2 are waiting until the next heartbeat. I wonder why > the > four map tasks cannot be launched together? > > The 5sec delay will be amplified if the total number of machines increases, > and the execution time will be delayed significantly. I think if we can > manage to set the map tasks run as much as possible in one heartbeat and > takes all the available slots, there will be a great improvements in the > performance. > > I would like to hear your suggestions and opinions on this. > > Best regards, > Starry > > /* Tomorrow is another day. So is today. */ > > > On Mon, Dec 21, 2009 at 14:41, Chandraprakash Bhagtani < > [email protected] > > wrote: > > > You can increase the map/reduce slots > > using "mapred.tasktracker.map(reduce).tasks.maximum" property only > > > > there can be following cases > > > > 1. your changes are not taking effect. you need to restart the cluster > > after > > making changes in conf xml. > > you can check your cluster (Map/Reduce) capacity at > > http://jobtracker-address:50030/ > > > > 2. your data is not enough to create more than 2 map tasks. But in that > > case > > reducers should still be equal > > to mapred.reduce.tasks > > > > On Mon, Dec 21, 2009 at 9:39 AM, Starry SHI <[email protected]> wrote: > > > > > Hi, > > > > > > I am currently using hadoop 0.19.2 to run large data processing. But I > > > noticed when the job is launched, there are only two map/reduce tasks > > > running in the very beginning. after one heartbeat (5sec), another two > > > map/reduce task is started. I want to ask how I can increase the > > map/reduce > > > slots? > > > > > > In the configuration file, I have already set > > > "mapred.tasktracker.map(reduce).tasks.maximum" to 10, and > > > "mapred.map(reduce).tasks" to 10. But there are still only 2 are > > launched. > > > > > > Eager to hear from your solutions! > > > > > > Best regards, > > > Starry > > > > > > /* Tomorrow is another day. So is today. */ > > > > > > > > > > > -- > > Thanks & Regards, > > Chandra Prakash Bhagtani, > > Impetus Infotech (india) Pvt Ltd. > > >
