I havve only seen that type of error when the tasktracker machine is very heavily loaded and the task does not exit in a timely manner after the tasktracker terminates it.
Is this error in your task log or in the tasktracker log? On Fri, Jan 1, 2010 at 3:02 PM, himanshu chandola < [email protected]> wrote: > Thanks. > > This is probably something trivial but if you would've any idea what could > be causing this, it would be helpful. I replaced the mapred.local.dir to > drives which have bigger capacity. The map jobs start to fail with the > following message: > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > taskTracker/jobcache/job_200912311931_0002/attempt_200912311931_0002_m_000027_0/output/file.out.index > in any of the configured local directories > > > This is weird because the file in question exists on that machine in that > directory (taskTracker/jobcache....). The permissions are also right so I > haven't been able to realize what could be the problem. > > Do you have any ideas on this ? > > Thanks > > > Morpheus: Do you believe in fate, Neo? > Neo: No. > Morpheus: Why Not? > Neo: Because I don't like the idea that I'm not in control of my life. > > > > ----- Original Message ---- > From: Jason Venner <[email protected]> > To: [email protected] > Sent: Thu, December 31, 2009 1:46:47 PM > Subject: Re: large reducer output with same key > > the mapred.local.dir paramter will be used by each tasktracker node to > fprovide directory(ies) to store transitory data about the tasks the > tasktracker runs. > This includes the map output, and can be very large. > > On Thu, Dec 31, 2009 at 10:03 AM, himanshu chandola < > [email protected]> wrote: > > > Hi Todd, > > Are these directories supposed to be on the namenode or on each of the > > datanodes ? In my case it is set to a directory inside /tmp but the > > mapred.local.dir was present only on the namenode. > > > > Thanks for the help > > > > Himanshu > > > > Morpheus: Do you believe in fate, Neo? > > Neo: No. > > Morpheus: Why Not? > > Neo: Because I don't like the idea that I'm not in control of my life. > > > > > > > > ----- Original Message ---- > > From: Todd Lipcon <[email protected]> > > To: [email protected] > > Sent: Thu, December 31, 2009 10:17:05 AM > > Subject: Re: large reducer output with same key > > > > Hi Himanshu, > > > > Sounds like your mapred.local.dir doesn't have enough space. My guess is > > that you've configured it somewhere inside /tmp/. Instead you should > spread > > it across all of your local physical disks by comma-separating the > > directories in the configuration. Something like: > > > > <property> > > <name>mapred.local.dir</name> > > > <value>/disk1/mapred-local,/disk2/mapred-local,/disk3/mapred-local</value> > > </property> > > > > (and of course make sure those directories exist and are writable by the > > user that runs your hadoop daemons, often "hadoop") > > > > Thanks > > -Todd > > > > On Thu, Dec 31, 2009 at 2:10 AM, himanshu chandola < > > [email protected]> wrote: > > > > > Hi Everyone, > > > My reducer output results in most of the data having the same key. The > > > reducer output is close to 16 GB and though my cluster in total has a > > > terabyte of space in hdfs I get errors like the following : > > > > > > > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:719) > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:209) > > > > at > > > > > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084) > > > > Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: > > > > Could not find any valid local directory for > > > > task_200808021906_0002_m_000014_2/spill4.out > > > > > > After such failures, hadoop tries to start the same reduce job couple > > times > > > on other nodes before the job fails. From the > > > exception, it looks to me this is > > > probably a disk error(some machines have less than 16 gigs free space > on > > > hdfs). > > > > > > So my question was whether hadoop puts values which share the same key > as > > a > > > single block in one node ? Or something else > > > could be happening here ? > > > > > > Thanks > > > > > > H > > > > > > > > > > > > > > > > > > > > > > > -- > Pro Hadoop, a book to guide you from beginner to hadoop mastery, > http://www.amazon.com/dp/1430219424?tag=jewlerymall > www.prohadoopbook.com a community for Hadoop Professionals > > > > > -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
