[ 
https://issues.apache.org/jira/browse/PIG-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822909#comment-15822909
 ] 

Rohini Palaniswamy commented on PIG-5083:
-----------------------------------------

If you look at 
https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/Packager.java#L113-L129
 , in case of mapreduce where readOnce is true records are read from the 
PeekedBag (extends ReadOnceBag)  and put in a InternalCachedBag before being 
handed off to the CombinerPackager.getNext() which then creates different bags 
for the rest of the plan to work with. Since the bag is only iterated once, 
there is no need to materialize it into a InternalCachedBag. Iteration can be 
done on the ReadOnceBag.

 What this patch does is pass the PeekedBag directly to the CombinerPackager in 
case of mapreduce and pass TezReadOnceBag with tez.  Tez was always 
constructing a InternalCachedBag before and did not have concept of ReadOnceBag 
(readOnce was always false). It saves one copy and a lot of memory+GC.

> CombinerPackager and LitePackager should not materialize bags
> -------------------------------------------------------------
>
>                 Key: PIG-5083
>                 URL: https://issues.apache.org/jira/browse/PIG-5083
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.17.0
>
>         Attachments: PIG-5083-1.patch
>
>
> Before PIG-3591 and creation of CombinerPackager, POCombinerPackage directly 
> read from the combiner/reducer input instead of materializing the bag.
> https://github.com/apache/pig/blob/branch-0.12/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCombinerPackage.java#L140-L161
> The unnecessary materialization leads to lot of spills and OOMs in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to