On Mon, Jun 14, 2010 at 1:15 PM, Johannes Zillmann <[email protected]
> wrote:

> Hi,
>
> i have running a 4-node cluster with hadoop-0.20.2. Now i suddenly run into
> a situation where every task scheduled on 2 of the 4 nodes failed.
> Seems like the child jvm crashes. There are no child logs under
> logs/userlogs. Tasktracker gives this:
>
> 2010-06-14 09:34:12,714 INFO org.apache.hadoop.mapred.JvmManager: In
> JvmRunner constructed JVM ID: jvm_201006091425_0049_m_-946174604
> 2010-06-14 09:34:12,714 INFO org.apache.hadoop.mapred.JvmManager: JVM
> Runner jvm_201006091425_0049_m_-946174604 spawned.
> 2010-06-14 09:34:12,727 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_201006091425_0049_m_-946174604 exited. Number of tasks it ran: 0
> 2010-06-14 09:34:12,727 WARN org.apache.hadoop.mapred.TaskRunner:
> attempt_201006091425_0049_m_003179_0 Child Error
> java.io.IOException: Task process exit with nonzero status of 1.
>        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
>
>
> At some point i simply renamed logs/userlogs to logs/userlogsOLD. A new job
> created the logs/userlogs again and no error ocuured anymore on this host.
> The permissions of userlogs and userlogsOLD are exactly the same.
> userlogsOLD contains about 378M in 132747 files. When copying the content of
> userlogsOLD into userlogs, the tasks of the belonging node starts failing
> again.
>
> Some questions:
> - this seems to me like a problem with too many files in one folder - any
> thoughts on this ?
> - is the content of logs/userlogs cleaned up by hadoop regularly ?
> - the logs/stdout file of the tasks are not existent, the logs/out fiels of
> the tasktracker hasn't any specific message (other then message posted
> above) - is there any log file left where an error message could be found ?
>
>
> best regards
> Johannes


Most file systems have an upper limit on number of subfiles/folders in a
folder. You have probably hit the EXT3 limit. If you launch lots and lots of
jobs you can hit the limit before any cleanup happens.

You can experiment with cleanup and other filesystems. The following log
related issue might be relevant.

https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877614#action_12877614

Regards,
Edward

Reply via email to