[Hadoop Wiki] Update of "LimitingTaskSlotUsage" by Some OtherAccount

Apache Wiki Fri, 22 Oct 2010 08:49:51 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "LimitingTaskSlotUsage" page has been changed by SomeOtherAccount.
http://wiki.apache.org/hadoop/LimitingTaskSlotUsage?action=diff&rev1=1&rev2=2

--------------------------------------------------

- 
+ = Limiting Task Slot Usage =
  
  There are many reasons why one wants to limit the number of running tasks. 
  
- * Job is consuming all task slots
+ == Job is consuming all task slots ==
  
  The most common reason is because a given job is consuming all of the 
available task slots, preventing other jobs from running.   The easiest and 
best solution is to switch from the default FIFO scheduler to another 
scheduler, such as the FairShareScheduler or the CapacityScheduler.  Both 
support job tasks limit.
  
- * Job has taken too many reduce slots that are still waiting for maps to 
finish
+ == Job has taken too many reduce slots that are still waiting for maps to 
finish ==
  
  There is a job tunable called mapred.reduce.slowstart.completed.maps that 
sets the percentage of maps that must be completed before firing off reduce 
tasks.  By default, this is set to 5% (0.05) which for most shared clusters is 
likely too low.  Recommended values are closer to 80% or higher (0.80).  Note 
that for jobs that have a significant amount of intermediate data, setting this 
value higher will cause reduce slots to take more time fetching that data 
before performing work.
  
- * Job is referencing an external, limited resource (such as a database)
+ == Job is referencing an external, limited resource (such as a database) ==
  
  In Hadoop terms, we call this a 'side-effect'.
  
@@ -20, +20 @@

  
  If a task absolutely must break the rules, there are a few things one can do:
  
- ** Deploy ZooKeeper and use it as a persistent lock to keep track of how many 
tasks are running concurrently
+  * Deploy ZooKeeper and use it as a persistent lock to keep track of how many 
tasks are running concurrently
- ** Use a scheduler with a maximum task-per-queue feature and submit the job 
to that queue
+  * Use a scheduler with a maximum task-per-queue feature and submit the job 
to that queue
  
- * Job consumes too much RAM/disk IO/etc on a given node
+ == Job consumes too much RAM/disk IO/etc on a given node ==
  
  The CapacityScheduler in 0.21 has a feature whereby one may use RAM-per-task 
to limit how many slots a given task takes.  By careful use of this feature, 
one may limit how many concurrent tasks on a given node a job may take.

[Hadoop Wiki] Update of "LimitingTaskSlotUsage" by Some OtherAccount

Reply via email to