[ 
https://issues.apache.org/jira/browse/PIG-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15805903#comment-15805903
 ] 

Rohini Palaniswamy edited comment on PIG-5083 at 1/6/17 10:09 PM:
------------------------------------------------------------------

While analyzing the heapdump during OOM for the Combiner, found that while 
deserializing next value in NullableTuple 
{code}
mValue = bis.readTuple(in);
{code}

the previous value of mValue was still a strong reference and could not be 
garbage collected. In case of DISTINCT inside nested foreach and 
map.exec.PartAgg=true that could be a really big bag taking up lot of memory 
and can lead to OOM while the next tuple is being deserialized in 
bis.readTuple. That is also fixed in this patch.

Just noticed that it could be applied to LitePackager as well and added that to 
the patch. So rerunning the full unit and e2e tests now. 


was (Author: rohini):
While analyzing the heapdump during OOM for the Combiner, found that while 
deserializing next value in NullableTuple 
{code}
mValue = bis.readTuple(in);
{code}

the previous value of mValue could not be collected. In case of DISTINCT inside 
nested foreach and map.exec.PartAgg=true that could be a really big bag and can 
lead to OOM. That is also fixed in this patch.

Just noticed that it could be applied to LitePackager as well and added that to 
the patch. So rerunning the full unit and e2e tests now. 

> CombinerPackager and LitePackager should not materialize bags
> -------------------------------------------------------------
>
>                 Key: PIG-5083
>                 URL: https://issues.apache.org/jira/browse/PIG-5083
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.17.0
>
>         Attachments: PIG-5083-1.patch
>
>
> Before PIG-3591 and creation of CombinerPackager, POCombinerPackage directly 
> read from the combiner/reducer input instead of materializing the bag.
> https://github.com/apache/pig/blob/branch-0.12/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCombinerPackage.java#L140-L161
> The unnecessary materialization leads to lot of spills and OOMs in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to