[ https://issues.apache.org/jira/browse/PIG-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822747#comment-15822747 ]
Daniel Dai commented on PIG-5083: --------------------------------- I didn't get what you mean by not materializing the bag in https://github.com/apache/pig/blob/branch-0.12/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCombinerPackage.java#L140-L161. InternalCachedBag is created and tuples are taken from iterator and added to the bag in that code. On the other hand, I can see LitePackager is not given a ReadOnceBag in Tez and you do fix it in the patch. > CombinerPackager and LitePackager should not materialize bags > ------------------------------------------------------------- > > Key: PIG-5083 > URL: https://issues.apache.org/jira/browse/PIG-5083 > Project: Pig > Issue Type: Bug > Reporter: Rohini Palaniswamy > Assignee: Rohini Palaniswamy > Fix For: 0.17.0 > > Attachments: PIG-5083-1.patch > > > Before PIG-3591 and creation of CombinerPackager, POCombinerPackage directly > read from the combiner/reducer input instead of materializing the bag. > https://github.com/apache/pig/blob/branch-0.12/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCombinerPackage.java#L140-L161 > The unnecessary materialization leads to lot of spills and OOMs in some cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)