Seems like this is something with folder restrictions. Tried: cp -r logs/userlogsOLD/* logs/userlogs/ and got cp: cannot create directory `logs/userlogs/attempt_201006091425_0049_m_003169_0': Too many links
Johannes On Jun 16, 2010, at 9:30 AM, Manhee Jo wrote: > Hi, > > I've also encountered the same nonzero status of 1 error before. > What did you set to mapred.child.ulimit and mapred.child.java.opts? > mapred.child.ulimit must be greater than the -Xmx passed to JavaVM, > else the VM might not start. That's wat MR tutorial says. > Setting bigger ulimit, I could solve the problem. > Hope this help. > > > Regards, > Manhee > > ----- Original Message ----- From: "Edward Capriolo" <[email protected]> > To: <[email protected]> > Sent: Tuesday, June 15, 2010 2:47 AM > Subject: Re: Task process exit with nonzero status of 1 - deleting > userlogshelps > > >> On Mon, Jun 14, 2010 at 1:15 PM, Johannes Zillmann <[email protected] >>> wrote: >> >>> Hi, >>> >>> i have running a 4-node cluster with hadoop-0.20.2. Now i suddenly run into >>> a situation where every task scheduled on 2 of the 4 nodes failed. >>> Seems like the child jvm crashes. There are no child logs under >>> logs/userlogs. Tasktracker gives this: >>> >>> 2010-06-14 09:34:12,714 INFO org.apache.hadoop.mapred.JvmManager: In >>> JvmRunner constructed JVM ID: jvm_201006091425_0049_m_-946174604 >>> 2010-06-14 09:34:12,714 INFO org.apache.hadoop.mapred.JvmManager: JVM >>> Runner jvm_201006091425_0049_m_-946174604 spawned. >>> 2010-06-14 09:34:12,727 INFO org.apache.hadoop.mapred.JvmManager: JVM : >>> jvm_201006091425_0049_m_-946174604 exited. Number of tasks it ran: 0 >>> 2010-06-14 09:34:12,727 WARN org.apache.hadoop.mapred.TaskRunner: >>> attempt_201006091425_0049_m_003179_0 Child Error >>> java.io.IOException: Task process exit with nonzero status of 1. >>> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) >>> >>> >>> At some point i simply renamed logs/userlogs to logs/userlogsOLD. A new job >>> created the logs/userlogs again and no error ocuured anymore on this host. >>> The permissions of userlogs and userlogsOLD are exactly the same. >>> userlogsOLD contains about 378M in 132747 files. When copying the content of >>> userlogsOLD into userlogs, the tasks of the belonging node starts failing >>> again. >>> >>> Some questions: >>> - this seems to me like a problem with too many files in one folder - any >>> thoughts on this ? >>> - is the content of logs/userlogs cleaned up by hadoop regularly ? >>> - the logs/stdout file of the tasks are not existent, the logs/out fiels of >>> the tasktracker hasn't any specific message (other then message posted >>> above) - is there any log file left where an error message could be found ? >>> >>> >>> best regards >>> Johannes >> >> >> Most file systems have an upper limit on number of subfiles/folders in a >> folder. You have probably hit the EXT3 limit. If you launch lots and lots of >> jobs you can hit the limit before any cleanup happens. >> >> You can experiment with cleanup and other filesystems. The following log >> related issue might be relevant. >> >> https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877614#action_12877614 >> >> Regards, >> Edward > >
