Hi
I'm having a similar problem so I'll continue on this mailing to describe
my issue.

I ran a MR job that takes 70GB of input and creates 1098 mappers and 100
Reducers to process tasks. (on 9 node Hadoop cluster)
but the job fails and 4 datanode dies after few min (processes are still
running, but the master  recognize them as dead).
When I investigate the job, it looks like 20 mappers fail with these errors

ProcfsBasedProcessTree: java.io.IOException: Cannot run program "getconf":
> java.io.IOException: error=11, Resource temporarily unavailable
> ..
> OutOfMemoryError: unable to create new native thread
> ..
> # There is insufficient memory for the Java Runtime Environment to
> continue.
> # Cannot create GC thread. Out of system resources.


Reducers also fail because they weren't able to retrieve the failed mapper
outputs.
I'm guessing for somehow a JVM memory reaches its max and tasktrackers and
datanodes aren't able to create new threads, so they die.

But as lack of my experience in hadoop, I don't know what's actually
causing it. And of course I dun have answers to it yet.

here are some *configurations*
HADOOP_HEAPSIZE=4096
HADOOP_NAMENODE_OPTS = .. -Xmx2g ..
HADOOP_DATANODE_OPTS = .. -Xmx4g ..
HADOOP_JOBTRACKER_OPTS = .. -Xmx4g ..

dfs.datanode.max.xcievers = 60000
mapred.child.java.opts = -Xmx400m
mapred.tasktracker.map.tasks.maximum = 14
mapred.tasktracker.reduce.tasks.maximum = 14

also attached the* logs*

If anyone knows answers to it please please let me know.
I will appreciate anyone help on this.

Best regards,
Ben

On Fri, Jun 15, 2012 at 1:05 PM, Harsh J <ha...@cloudera.com> wrote:

> Do you ship a lot of dist-cache files or perhaps have a bad
> mapred.child.java.opts parameter?
>
> On Fri, Jun 15, 2012 at 1:39 AM, Shamshad Ansari <sans...@apixio.com>
> wrote:
> > Hi All,
> > When I run hadoop jobs, I observe the following errors. Also, I notice
> that
> > data node dies every time  the job is initiated.
> >
> > Does any one know what may be causing this and how to solve this?
> >
> > ======================
> >
> > 12/06/14 19:57:17 INFO input.FileInputFormat: Total input paths to
> process :
> > 1
> > 12/06/14 19:57:17 INFO mapred.JobClient: Running job:
> job_201206141136_0002
> > 12/06/14 19:57:18 INFO mapred.JobClient:  map 0% reduce 0%
> > 12/06/14 19:57:27 INFO mapred.JobClient: Task Id :
> > attempt_201206141136_0002_m_000001_0, Status : FAILED
> > java.lang.Throwable: Child Error
> >         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
> > Caused by: java.io.IOException: Task process exit with nonzero status of
> 1.
> >         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
> >
> > 12/06/14 19:57:27 WARN mapred.JobClient: Error reading task
> >
> outputhttp://node1:50060/tasklog?plaintext=true&attemptid=attempt_201206141136_0002_m_000001_0&filter=stdout
> > 12/06/14 19:57:27 WARN mapred.JobClient: Error reading task
> >
> outputhttp://node1:50060/tasklog?plaintext=true&attemptid=attempt_201206141136_0002_m_000001_0&filter=stderr
> > 12/06/14 19:57:33 INFO mapred.JobClient: Task Id :
> > attempt_201206141136_0002_r_000002_0, Status : FAILED
> > java.lang.Throwable: Child Error
> >         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
> > Caused by: java.io.IOException: Task process exit with nonzero status of
> 1.
> >         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
> >
> > 12/06/14 19:57:33 WARN mapred.JobClient: Error reading task
> >
> outputhttp://node1:50060/tasklog?plaintext=true&attemptid=attempt_201206141136_0002_r_000002_0&filter=stdout
> > 12/06/14 19:57:33 WARN mapred.JobClient: Error reading task
> >
> outputhttp://node1:50060/tasklog?plaintext=true&attemptid=attempt_201206141136_0002_r_000002_0&filter=stderr
> > ^Chadoop@ip-10-174-87-251:~/apixio-pipeline/pipeline-trigger$ 12/06/14
> > 19:57:27 WARN mapred.JobClient: Error reading task
> >
> outputhttp:/node1:50060/sklog?plaintext=true&attemptid=attempt_201206141136_0002_m_000001_0&filter=stdout
> >
> > Thank you,
> > --Shamshad
> >
>
>
>
> --
> Harsh J
>



-- 

*Benjamin Kim*
*benkimkimben at gmail*

Attachment: datanode.log
Description: Binary data

Attachment: mapper.log
Description: Binary data

Attachment: reducer.log
Description: Binary data

Attachment: tasktracker.log
Description: Binary data

Reply via email to