Re: Task process exit with nonzero status of 1 - deleting userlogshelps

Manhee Jo Wed, 16 Jun 2010 00:31:24 -0700

Hi,

I've also encountered the same nonzero status of 1 error before.
What did you set to mapred.child.ulimit and mapred.child.java.opts?
mapred.child.ulimit must be greater than the -Xmx passed to JavaVM,
else the VM might not start. That's wat MR tutorial says.
Setting bigger ulimit, I could solve the problem.
Hope this help.



Regards,
Manhee

----- Original Message -----From: "Edward Capriolo" <[email protected]>

To: <[email protected]>
Sent: Tuesday, June 15, 2010 2:47 AM

Subject: Re: Task process exit with nonzero status of 1 - deletinguserlogshelps

On Mon, Jun 14, 2010 at 1:15 PM, Johannes Zillmann<[email protected]

wrote:

Hi,

i have running a 4-node cluster with hadoop-0.20.2. Now i suddenly runinto

a situation where every task scheduled on 2 of the 4 nodes failed.
Seems like the child jvm crashes. There are no child logs under
logs/userlogs. Tasktracker gives this:

2010-06-14 09:34:12,714 INFO org.apache.hadoop.mapred.JvmManager: In
JvmRunner constructed JVM ID: jvm_201006091425_0049_m_-946174604
2010-06-14 09:34:12,714 INFO org.apache.hadoop.mapred.JvmManager: JVM
Runner jvm_201006091425_0049_m_-946174604 spawned.
2010-06-14 09:34:12,727 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_201006091425_0049_m_-946174604 exited. Number of tasks it ran: 0
2010-06-14 09:34:12,727 WARN org.apache.hadoop.mapred.TaskRunner:
attempt_201006091425_0049_m_003179_0 Child Error
java.io.IOException: Task process exit with nonzero status of 1.
       at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)

At some point i simply renamed logs/userlogs to logs/userlogsOLD. A newjobcreated the logs/userlogs again and no error ocuured anymore on thishost.

The permissions of userlogs and userlogsOLD are exactly the same.

userlogsOLD contains about 378M in 132747 files. When copying the contentof

userlogsOLD into userlogs, the tasks of the belonging node starts failing
again.

Some questions:
- this seems to me like a problem with too many files in one folder - any
thoughts on this ?
- is the content of logs/userlogs cleaned up by hadoop regularly ?

- the logs/stdout file of the tasks are not existent, the logs/out fielsof

the tasktracker hasn't any specific message (other then message posted

above) - is there any log file left where an error message could be found?



best regards
Johannes



Most file systems have an upper limit on number of subfiles/folders in a

folder. You have probably hit the EXT3 limit. If you launch lots and lotsof

jobs you can hit the limit before any cleanup happens.

You can experiment with cleanup and other filesystems. The following log
related issue might be relevant.

https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877614#action_12877614

Regards,
Edward

Re: Task process exit with nonzero status of 1 - deleting userlogshelps

Reply via email to