Re: Task process exit with nonzero status of 1 - deleting userlogshelps

Johannes Zillmann Thu, 17 Jun 2010 02:14:24 -0700

Seems like this is something with folder restrictions.
Tried:
  cp -r logs/userlogsOLD/* logs/userlogs/
and got
  cp: cannot create directory 
`logs/userlogs/attempt_201006091425_0049_m_003169_0': Too many links


Johannes

On Jun 16, 2010, at 9:30 AM, Manhee Jo wrote:

> Hi,
> 
> I've also encountered the same nonzero status of 1 error before.
> What did you set to mapred.child.ulimit and mapred.child.java.opts?
> mapred.child.ulimit must be greater than the -Xmx passed to JavaVM,
> else the VM might not start. That's wat MR tutorial says.
> Setting bigger ulimit, I could solve the problem.
> Hope this help.
> 
> 
> Regards,
> Manhee
> 
> ----- Original Message ----- From: "Edward Capriolo" <[email protected]>
> To: <[email protected]>
> Sent: Tuesday, June 15, 2010 2:47 AM
> Subject: Re: Task process exit with nonzero status of 1 - deleting 
> userlogshelps
> 
> 
>> On Mon, Jun 14, 2010 at 1:15 PM, Johannes Zillmann <[email protected]
>>> wrote:
>> 
>>> Hi,
>>> 
>>> i have running a 4-node cluster with hadoop-0.20.2. Now i suddenly run into
>>> a situation where every task scheduled on 2 of the 4 nodes failed.
>>> Seems like the child jvm crashes. There are no child logs under
>>> logs/userlogs. Tasktracker gives this:
>>> 
>>> 2010-06-14 09:34:12,714 INFO org.apache.hadoop.mapred.JvmManager: In
>>> JvmRunner constructed JVM ID: jvm_201006091425_0049_m_-946174604
>>> 2010-06-14 09:34:12,714 INFO org.apache.hadoop.mapred.JvmManager: JVM
>>> Runner jvm_201006091425_0049_m_-946174604 spawned.
>>> 2010-06-14 09:34:12,727 INFO org.apache.hadoop.mapred.JvmManager: JVM :
>>> jvm_201006091425_0049_m_-946174604 exited. Number of tasks it ran: 0
>>> 2010-06-14 09:34:12,727 WARN org.apache.hadoop.mapred.TaskRunner:
>>> attempt_201006091425_0049_m_003179_0 Child Error
>>> java.io.IOException: Task process exit with nonzero status of 1.
>>>       at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
>>> 
>>> 
>>> At some point i simply renamed logs/userlogs to logs/userlogsOLD. A new job
>>> created the logs/userlogs again and no error ocuured anymore on this host.
>>> The permissions of userlogs and userlogsOLD are exactly the same.
>>> userlogsOLD contains about 378M in 132747 files. When copying the content of
>>> userlogsOLD into userlogs, the tasks of the belonging node starts failing
>>> again.
>>> 
>>> Some questions:
>>> - this seems to me like a problem with too many files in one folder - any
>>> thoughts on this ?
>>> - is the content of logs/userlogs cleaned up by hadoop regularly ?
>>> - the logs/stdout file of the tasks are not existent, the logs/out fiels of
>>> the tasktracker hasn't any specific message (other then message posted
>>> above) - is there any log file left where an error message could be found ?
>>> 
>>> 
>>> best regards
>>> Johannes
>> 
>> 
>> Most file systems have an upper limit on number of subfiles/folders in a
>> folder. You have probably hit the EXT3 limit. If you launch lots and lots of
>> jobs you can hit the limit before any cleanup happens.
>> 
>> You can experiment with cleanup and other filesystems. The following log
>> related issue might be relevant.
>> 
>> https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877614#action_12877614
>> 
>> Regards,
>> Edward
> 
>

Re: Task process exit with nonzero status of 1 - deleting userlogshelps

Reply via email to