On Sep 15, 2008, at 11:24 AM, Kayla Jay wrote:

How does one do a check or guarantee there's enough disk space when running a hadoop job that you're not sure how much it will produce in its results (temp files, etc) ?

In 0.19 there is new code that waits until the first N% of maps are run and estimates the amount of space required for each of the following tasks. You can see the discussion here:

https://issues.apache.org/jira/browse/HADOOP-657

The task tracker can also set the mapred.local.dir.minspacestart variable, which controls the minimum amount of disk space that must be free before it will ask for a new task.

Or, what if you run out of disk space on the HDFS if you are running large jobs with large outputs ? The job just fails .. but how can one assess this resource allocation of disk space while running your jobs?

Map/Reduce works by re-executing tasks that fail, including tasks that fail for lack of disk space. If the task fails, the partial results are erased on the assumption that they will be run later. The tasks that finish, will have their output in the output directory, even if the job fails.

-- Owen

Reply via email to