[ 
https://issues.apache.org/jira/browse/PIG-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5083:
------------------------------------
    Attachment: PIG-5083-1.patch

While analyzing the heapdump during OOM for the Combiner, found that while 
deserializing next value in NullableTuple 
{code}
mValue = bis.readTuple(in);
{code}

the previous value of mValue could not be collected. In case of DISTINCT inside 
nested foreach and map.exec.PartAgg=true that could be a really big bag and can 
lead to OOM. That is also fixed in this patch.

Just noticed that it could be applied to LitePackager as well and added that to 
the patch. So rerunning the full unit and e2e tests now. 

> CombinerPackager and LitePackager should not materialize bags
> -------------------------------------------------------------
>
>                 Key: PIG-5083
>                 URL: https://issues.apache.org/jira/browse/PIG-5083
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.17.0
>
>         Attachments: PIG-5083-1.patch
>
>
> Before PIG-3591 and creation of CombinerPackager, POCombinerPackage directly 
> read from the combiner/reducer input instead of materializing the bag.
> https://github.com/apache/pig/blob/branch-0.12/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCombinerPackage.java#L140-L161
> The unnecessary materialization leads to lot of spills and OOMs in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to