[ http://issues.apache.org/jira/browse/HADOOP-370?page=comments#action_12422378 ] Doug Cutting commented on HADOOP-370: -------------------------------------
Yes, let's cache the "good dirs". If a drive goes offline or becomes unwritable while a node is running, then we should start emitting warnings, but we should not warn more than once for drives that are offline or unwritable at startup. > TaskTracker startup fails if any mapred.local.dir entries don't exist > --------------------------------------------------------------------- > > Key: HADOOP-370 > URL: http://issues.apache.org/jira/browse/HADOOP-370 > Project: Hadoop > Issue Type: Bug > Components: mapred > Environment: ~30 node cluster, various size/number of disks, CPUs, > memory > Reporter: Bryan Pendleton > Attachments: fix-freespace-tasktracker-failure.txt > > > This appears to have been introduced with the "check for enough free space" > before startup. > It's debatable how best to fix this bug. I will submit a patch which ignores > directories for which the DF utility fails. This is letting me continue > operation on my cluster (where the number of drives varies, so there are > entries in mapred.local.dir for drives that aren't on all cluster nodes), but > a cleaner solution is probably better. I'd lean towards "check for > existence", and ignore the dir if it doesn't - but don't depend on DF to > fail, since DF could fail for other reasons without meaning you're out of > disk space. I argue that a TaskTracker should start up if *all* directories > that *can be written to* in the list have enough space. Otherwise, a failed > drive per cluster machine means no work ever gets done. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
