[ 
https://issues.apache.org/jira/browse/PIG-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205227#comment-15205227
 ] 

Rohini Palaniswamy commented on PIG-4843:
-----------------------------------------

bq. Do you mean once MAPREDUCE-5221 is fixed, we might see this in MR as well?
  Yes. When MergeManager is running, lot of memory is already in use by 
framework - the shuffle buffers, io.sort.mb amount of memory used for merging 
and the behaviour of DataOutputBuffer which keeps doubling in size requiring 
lot of memory (TEZ-3159). And in this case the combiner hardly reduces any data 
because Distinct bags are still very big.

> Turn off combiner in reducer vertex for Tez if bags are in combine plan
> -----------------------------------------------------------------------
>
>                 Key: PIG-4843
>                 URL: https://issues.apache.org/jira/browse/PIG-4843
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.16.0
>
>         Attachments: PIG-4843-1.patch
>
>
> {code}
> B = group A by key;
> C = foreach B {
>                                          key_value           =  A.key_value;
>                                          distinct_key_value  = DISTINCT 
> key_value;
>                                          generate group, MIN(A.key_value) as 
> min_value, MAX(A.key_value) as max_value, COUNT(distinct_key_value) as 
> distinct_values;
>                     }
> {code}
> In the above example, the combine plan holds the Distinct bag and it causes 
> OOM when combiner is run by the MergeManager in the reducer. We did not have 
> this issue with mapreduce as combiner is not running in reducer for new API 
> till now (MAPREDUCE-5221)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to