Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "LimitingTaskSlotUsage" page has been changed by SomeOtherAccount. http://wiki.apache.org/hadoop/LimitingTaskSlotUsage?action=diff&rev1=4&rev2=5 -------------------------------------------------- The CapacityScheduler in 0.21 has a feature whereby one may use RAM-per-task to limit how many slots a given task takes. By careful use of this feature, one may limit how many concurrent tasks on a given node a job may take. + = Increasing the Number of Slots Used = + + There are both job and server-level tunables that impact how many tasks are run concurrently. + + == Increase the amount of tasks per node == + + There are two server tunables that determine how many tasks a given TaskTracker will run on a node: + + * mapred.tasktracker.map.tasks.maximum sets the map slot usage + * mapred.tasktracker.reduce.tasks.maximum sets the reduce slot usage + + These must be set in the mapred-site.xml file on the TaskTracker. After making the change, the TaskTracker must be restarted to see it. One should see the values increase (or decrease) on the JobTracker main page. Note that this is '''not''' set by your job. + + == Increase the amount of map tasks == + + Typically, the amount of maps per job is determined by Hadoop based upon the InputFormat and the block size in place. Using mapred.min.split.size and mapred.max.split.size settings, one can provide hints to the system that it should use a size that is different than the block size to determine what the min and max input size should be. + + == Increase the amount of reduce tasks == + + Currently, the number of reduces is determined by the job. mapred.reduce.tasks should be set by the job to the appropriate number of reduces. When using Pig, use the PARALLEL keyword. +
