I have a Hadoop job that I have successfully run with an input set of about 50 million input records. To test scaling and prepare for where we plan to be a year or two from now I tried the same job with a about 4 times as many records
most map tasks fail with the message could not find any valid local directory for tasktracker/jobcache/job.../jars The first job is writing about 4 TB and is running on an 0.23 cluster. My general understanding is that this message occurs when a tmp directory on local drive gets full. I have requested that systems restart the cluster. My questions are: 1) Are there commands to run on a slave to see the issue? 2) Will restarting the cluster clear things out and help? 3) Are there ways to tune the job to mitigate this issue?