[Hadoop Wiki] Update of "FAQ" by SomeOtherAccount

Apache Wiki Fri, 22 Oct 2010 08:32:23 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "FAQ" page has been changed by SomeOtherAccount.
http://wiki.apache.org/hadoop/FAQ?action=diff&rev1=79&rev2=80

--------------------------------------------------

  hadoop job -kill JOBID
  }}}
  
+ == How do I limit the number of concurrent tasks my job may have running 
total at a time? ==
+ 
+ Typically when this question is asked, it is because a job is referencing 
something external to Hadoop that has some sort of limit on it, such as reading 
or writing from a database.  In Hadoop terms, we call this a 'side-effect'.
+ 
+ One of the general assumptions of the framework is that there are not any 
side-effects. All tasks are expected to be restartable and a side-effect 
typically goes against the grain of this rule.
+ 
+ If a task absolutely must break the rules, there are a few things one can do:
+ 
+ * Deploy ZooKeeper and use it as a persistent lock to keep track of how many 
tasks are running concurrently
+ * Use a scheduler with a maximum task-per-queue feature and submit the job to 
that queue
+ 
+ == How do I limit the number of concurrent tasks my job may have running on a 
given node at a time? ==
+ 
+ The CapacityScheduler in 0.21 has a feature whereby one may use RAM-per-task 
to limit how many slots a given task takes.  By careful use of this feature, 
one may limit how many concurrent tasks on a given node a job may take. 
+ 
  = HDFS =
  
  == If I add new DataNodes to the cluster will HDFS move the blocks to the 
newly added nodes in order to balance disk space utilization between the nodes? 
==

[Hadoop Wiki] Update of "FAQ" by SomeOtherAccount

Reply via email to