[ https://issues.apache.org/jira/browse/PIG-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046993#comment-14046993 ]
Cheolsoo Park commented on PIG-4043: ------------------------------------ {quote} I think the OOM is because there are two huge arrays during the same time unlike Hadoop 1.x HadoopShims. {quote} This isn't true. In fact, I am seeing OOM in 0.12 that doesn't include the code you're referring to (introduced by PIG-3913). In 0.12, there are no two copies of TaskReport arrays. If you look at the heap dump, it is a single array object that is as big as 800MB. In addition, I see the same issue in Lipstick, for example, [here|https://github.com/Netflix/Lipstick/blob/master/lipstick-console/src/main/java/com/netflix/lipstick/pigtolipstick/BasicP2LClient.java#L414]. The Pig dies as soon as calling {{JobClient.getTaskMapReports()}}. I've been running several tests so far. It's clear that I cannot run my job (100K mappers) with any {{JobClient.getTaskMapReports()}} call in both Pig and Lipstick in Hadoop 2.4. Unless {{JobClient.getTaskMapReports()}} itself returns an iterator, we need a way of disabling it. > JobClient.getMap/ReduceTaskReports() causes OOM for jobs with a large number > of tasks > ------------------------------------------------------------------------------------- > > Key: PIG-4043 > URL: https://issues.apache.org/jira/browse/PIG-4043 > Project: Pig > Issue Type: Bug > Reporter: Cheolsoo Park > Assignee: Cheolsoo Park > Fix For: 0.14.0 > > Attachments: PIG-4043-1.patch, heapdump.png > > > With Hadoop 2.4, I often see Pig client fails due to OOM when there are many > tasks (~100K) with 1GB heap size. > The heap dump (attached) shows that TaskReport[] occupies about 80% of heap > space at the time of OOM. > The problem is that JobClient.getMap/ReduceTaskReports() returns an array of > TaskReport objects, which can be huge if the number of task is large. -- This message was sent by Atlassian JIRA (v6.2#6252)