[jira] [Commented] (PIG-4043) JobClient.getMap/ReduceTaskReports() causes OOM for jobs with a large number of tasks

Cheolsoo Park (JIRA) Sat, 28 Jun 2014 15:48:43 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046993#comment-14046993
 ]


Cheolsoo Park commented on PIG-4043:
------------------------------------

{quote}
I think the OOM is because there are two huge arrays during the same time 
unlike Hadoop 1.x HadoopShims.
{quote}
This isn't true. In fact, I am seeing OOM in 0.12 that doesn't include the code 
you're referring to (introduced by PIG-3913). In 0.12, there are no two copies 
of TaskReport arrays. If you look at the heap dump, it is a single array object 
that is as big as 800MB.

In addition, I see the same issue in Lipstick, for example, 
[here|https://github.com/Netflix/Lipstick/blob/master/lipstick-console/src/main/java/com/netflix/lipstick/pigtolipstick/BasicP2LClient.java#L414].
 The Pig dies as soon as calling {{JobClient.getTaskMapReports()}}. I've been 
running several tests so far. It's clear that I cannot run my job (100K 
mappers) with any {{JobClient.getTaskMapReports()}} call in both Pig and 
Lipstick in Hadoop 2.4.

Unless {{JobClient.getTaskMapReports()}} itself returns an iterator, we need a 
way of disabling it.

> JobClient.getMap/ReduceTaskReports() causes OOM for jobs with a large number 
> of tasks
> -------------------------------------------------------------------------------------
>
>                 Key: PIG-4043
>                 URL: https://issues.apache.org/jira/browse/PIG-4043
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.14.0
>
>         Attachments: PIG-4043-1.patch, heapdump.png
>
>
> With Hadoop 2.4, I often see Pig client fails due to OOM when there are many 
> tasks (~100K) with 1GB heap size.
> The heap dump (attached) shows that TaskReport[] occupies about 80% of heap 
> space at the time of OOM.
> The problem is that JobClient.getMap/ReduceTaskReports() returns an array of 
> TaskReport objects, which can be huge if the number of task is large.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PIG-4043) JobClient.getMap/ReduceTaskReports() causes OOM for jobs with a large number of tasks

Reply via email to