Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "FAQ" page has been changed by SomeOtherAccount. http://wiki.apache.org/hadoop/FAQ?action=diff&rev1=79&rev2=80 -------------------------------------------------- hadoop job -kill JOBID }}} + == How do I limit the number of concurrent tasks my job may have running total at a time? == + + Typically when this question is asked, it is because a job is referencing something external to Hadoop that has some sort of limit on it, such as reading or writing from a database. In Hadoop terms, we call this a 'side-effect'. + + One of the general assumptions of the framework is that there are not any side-effects. All tasks are expected to be restartable and a side-effect typically goes against the grain of this rule. + + If a task absolutely must break the rules, there are a few things one can do: + + * Deploy ZooKeeper and use it as a persistent lock to keep track of how many tasks are running concurrently + * Use a scheduler with a maximum task-per-queue feature and submit the job to that queue + + == How do I limit the number of concurrent tasks my job may have running on a given node at a time? == + + The CapacityScheduler in 0.21 has a feature whereby one may use RAM-per-task to limit how many slots a given task takes. By careful use of this feature, one may limit how many concurrent tasks on a given node a job may take. + = HDFS = == If I add new DataNodes to the cluster will HDFS move the blocks to the newly added nodes in order to balance disk space utilization between the nodes? ==
