Hi,
I've also encountered the same nonzero status of 1 error before.
What did you set to mapred.child.ulimit and mapred.child.java.opts?
mapred.child.ulimit must be greater than the -Xmx passed to JavaVM,
else the VM might not start. That's wat MR tutorial says.
Setting bigger ulimit, I could solve the problem.
Hope this help.
Regards,
Manhee
----- Original Message -----
From: "Edward Capriolo" <[email protected]>
To: <[email protected]>
Sent: Tuesday, June 15, 2010 2:47 AM
Subject: Re: Task process exit with nonzero status of 1 - deleting
userlogshelps
On Mon, Jun 14, 2010 at 1:15 PM, Johannes Zillmann
<[email protected]
wrote:
Hi,
i have running a 4-node cluster with hadoop-0.20.2. Now i suddenly run
into
a situation where every task scheduled on 2 of the 4 nodes failed.
Seems like the child jvm crashes. There are no child logs under
logs/userlogs. Tasktracker gives this:
2010-06-14 09:34:12,714 INFO org.apache.hadoop.mapred.JvmManager: In
JvmRunner constructed JVM ID: jvm_201006091425_0049_m_-946174604
2010-06-14 09:34:12,714 INFO org.apache.hadoop.mapred.JvmManager: JVM
Runner jvm_201006091425_0049_m_-946174604 spawned.
2010-06-14 09:34:12,727 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_201006091425_0049_m_-946174604 exited. Number of tasks it ran: 0
2010-06-14 09:34:12,727 WARN org.apache.hadoop.mapred.TaskRunner:
attempt_201006091425_0049_m_003179_0 Child Error
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
At some point i simply renamed logs/userlogs to logs/userlogsOLD. A new
job
created the logs/userlogs again and no error ocuured anymore on this
host.
The permissions of userlogs and userlogsOLD are exactly the same.
userlogsOLD contains about 378M in 132747 files. When copying the content
of
userlogsOLD into userlogs, the tasks of the belonging node starts failing
again.
Some questions:
- this seems to me like a problem with too many files in one folder - any
thoughts on this ?
- is the content of logs/userlogs cleaned up by hadoop regularly ?
- the logs/stdout file of the tasks are not existent, the logs/out fiels
of
the tasktracker hasn't any specific message (other then message posted
above) - is there any log file left where an error message could be found
?
best regards
Johannes
Most file systems have an upper limit on number of subfiles/folders in a
folder. You have probably hit the EXT3 limit. If you launch lots and lots
of
jobs you can hit the limit before any cleanup happens.
You can experiment with cleanup and other filesystems. The following log
related issue might be relevant.
https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877614#action_12877614
Regards,
Edward