Hi I'm having a similar problem so I'll continue on this mailing to describe my issue.
I ran a MR job that takes 70GB of input and creates 1098 mappers and 100 Reducers to process tasks. (on 9 node Hadoop cluster) but the job fails and 4 datanode dies after few min (processes are still running, but the master recognize them as dead). When I investigate the job, it looks like 20 mappers fail with these errors ProcfsBasedProcessTree: java.io.IOException: Cannot run program "getconf": > java.io.IOException: error=11, Resource temporarily unavailable > .. > OutOfMemoryError: unable to create new native thread > .. > # There is insufficient memory for the Java Runtime Environment to > continue. > # Cannot create GC thread. Out of system resources. Reducers also fail because they weren't able to retrieve the failed mapper outputs. I'm guessing for somehow a JVM memory reaches its max and tasktrackers and datanodes aren't able to create new threads, so they die. But as lack of my experience in hadoop, I don't know what's actually causing it. And of course I dun have answers to it yet. here are some *configurations* HADOOP_HEAPSIZE=4096 HADOOP_NAMENODE_OPTS = .. -Xmx2g .. HADOOP_DATANODE_OPTS = .. -Xmx4g .. HADOOP_JOBTRACKER_OPTS = .. -Xmx4g .. dfs.datanode.max.xcievers = 60000 mapred.child.java.opts = -Xmx400m mapred.tasktracker.map.tasks.maximum = 14 mapred.tasktracker.reduce.tasks.maximum = 14 also attached the* logs* If anyone knows answers to it please please let me know. I will appreciate anyone help on this. Best regards, Ben On Fri, Jun 15, 2012 at 1:05 PM, Harsh J <ha...@cloudera.com> wrote: > Do you ship a lot of dist-cache files or perhaps have a bad > mapred.child.java.opts parameter? > > On Fri, Jun 15, 2012 at 1:39 AM, Shamshad Ansari <sans...@apixio.com> > wrote: > > Hi All, > > When I run hadoop jobs, I observe the following errors. Also, I notice > that > > data node dies every time the job is initiated. > > > > Does any one know what may be causing this and how to solve this? > > > > ====================== > > > > 12/06/14 19:57:17 INFO input.FileInputFormat: Total input paths to > process : > > 1 > > 12/06/14 19:57:17 INFO mapred.JobClient: Running job: > job_201206141136_0002 > > 12/06/14 19:57:18 INFO mapred.JobClient: map 0% reduce 0% > > 12/06/14 19:57:27 INFO mapred.JobClient: Task Id : > > attempt_201206141136_0002_m_000001_0, Status : FAILED > > java.lang.Throwable: Child Error > > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) > > Caused by: java.io.IOException: Task process exit with nonzero status of > 1. > > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) > > > > 12/06/14 19:57:27 WARN mapred.JobClient: Error reading task > > > outputhttp://node1:50060/tasklog?plaintext=true&attemptid=attempt_201206141136_0002_m_000001_0&filter=stdout > > 12/06/14 19:57:27 WARN mapred.JobClient: Error reading task > > > outputhttp://node1:50060/tasklog?plaintext=true&attemptid=attempt_201206141136_0002_m_000001_0&filter=stderr > > 12/06/14 19:57:33 INFO mapred.JobClient: Task Id : > > attempt_201206141136_0002_r_000002_0, Status : FAILED > > java.lang.Throwable: Child Error > > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) > > Caused by: java.io.IOException: Task process exit with nonzero status of > 1. > > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) > > > > 12/06/14 19:57:33 WARN mapred.JobClient: Error reading task > > > outputhttp://node1:50060/tasklog?plaintext=true&attemptid=attempt_201206141136_0002_r_000002_0&filter=stdout > > 12/06/14 19:57:33 WARN mapred.JobClient: Error reading task > > > outputhttp://node1:50060/tasklog?plaintext=true&attemptid=attempt_201206141136_0002_r_000002_0&filter=stderr > > ^Chadoop@ip-10-174-87-251:~/apixio-pipeline/pipeline-trigger$ 12/06/14 > > 19:57:27 WARN mapred.JobClient: Error reading task > > > outputhttp:/node1:50060/sklog?plaintext=true&attemptid=attempt_201206141136_0002_m_000001_0&filter=stdout > > > > Thank you, > > --Shamshad > > > > > > -- > Harsh J > -- *Benjamin Kim* *benkimkimben at gmail*
datanode.log
Description: Binary data
mapper.log
Description: Binary data
reducer.log
Description: Binary data
tasktracker.log
Description: Binary data