Re: Task process exit with nonzero status of 1 - deleting userlogs helps

Amareshwari Sri Ramadasu Wed, 16 Jun 2010 00:23:59 -0700

The issue is fixed in branch 0.21 through 
http://issues.apache.org/jira/browse/MAPREDUCE-927.
Now, the attempt directories are moved inside job directory. So, userlogs 
directory will have only job directories.


Thanks
Amareshwari
On 6/16/10 12:47 PM, "Johannes Zillmann" <[email protected]> wrote:

Hi Edward,

i copied the userlogs folder which caused the error.
Two things which is speak against the too-many files theory.
a) i can add new files to this folder (touch userlogsOLD/a, etc... )
b) the sysctl fs.file-max shows 817874 whereas the file count on the first 
level of userlogsOLD is 31999 and all files recursively are 107400.

Any thoughts ?
Johannes


On Jun 14, 2010, at 7:47 PM, Edward Capriolo wrote:

> On Mon, Jun 14, 2010 at 1:15 PM, Johannes Zillmann <[email protected]
>> wrote:
>
>> Hi,
>>
>> i have running a 4-node cluster with hadoop-0.20.2. Now i suddenly run into
>> a situation where every task scheduled on 2 of the 4 nodes failed.
>> Seems like the child jvm crashes. There are no child logs under
>> logs/userlogs. Tasktracker gives this:
>>
>> 2010-06-14 09:34:12,714 INFO org.apache.hadoop.mapred.JvmManager: In
>> JvmRunner constructed JVM ID: jvm_201006091425_0049_m_-946174604
>> 2010-06-14 09:34:12,714 INFO org.apache.hadoop.mapred.JvmManager: JVM
>> Runner jvm_201006091425_0049_m_-946174604 spawned.
>> 2010-06-14 09:34:12,727 INFO org.apache.hadoop.mapred.JvmManager: JVM :
>> jvm_201006091425_0049_m_-946174604 exited. Number of tasks it ran: 0
>> 2010-06-14 09:34:12,727 WARN org.apache.hadoop.mapred.TaskRunner:
>> attempt_201006091425_0049_m_003179_0 Child Error
>> java.io.IOException: Task process exit with nonzero status of 1.
>>       at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
>>
>>
>> At some point i simply renamed logs/userlogs to logs/userlogsOLD. A new job
>> created the logs/userlogs again and no error ocuured anymore on this host.
>> The permissions of userlogs and userlogsOLD are exactly the same.
>> userlogsOLD contains about 378M in 132747 files. When copying the content of
>> userlogsOLD into userlogs, the tasks of the belonging node starts failing
>> again.
>>
>> Some questions:
>> - this seems to me like a problem with too many files in one folder - any
>> thoughts on this ?
>> - is the content of logs/userlogs cleaned up by hadoop regularly ?
>> - the logs/stdout file of the tasks are not existent, the logs/out fiels of
>> the tasktracker hasn't any specific message (other then message posted
>> above) - is there any log file left where an error message could be found ?
>>
>>
>> best regards
>> Johannes
>
>
> Most file systems have an upper limit on number of subfiles/folders in a
> folder. You have probably hit the EXT3 limit. If you launch lots and lots of
> jobs you can hit the limit before any cleanup happens.
>
> You can experiment with cleanup and other filesystems. The following log
> related issue might be relevant.
>
> https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877614#action_12877614
>
> Regards,
> Edward

Re: Task process exit with nonzero status of 1 - deleting userlogs helps

Reply via email to