[ 
https://issues.apache.org/jira/browse/PIG-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4843:
------------------------------------
    Summary: Turn off combiner in reducer vertex for Tez if bags are in combine 
plan  (was: Turn off combiner in reducer vertex for Tez)

Stack trace for reference

{code}
ERROR 1066: Unable to open iterator for alias Final_Events. Backend error : 
Vertex failed, vertexName=scope-656, vertexId=vertex_1454418370697_126672_1_03, 
diagnostics=[Task failed, taskId=task_1454418370697_126672_1_03_000957, 
diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
 error in shuffle in MemtoDiskMerger [scope_614]
        at 
org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POShuffleTezLoad.attachInputs(POShuffleTezLoad.java:121)
        at 
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.initializeInputs(PigProcessor.java:332)
        at 
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:210)
        at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
 error in shuffle in MemtoDiskMerger [scope_614]
        at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:361)
        at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337)
        ... 5 more
Caused by: java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3236)
        at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
        at 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at java.io.DataOutputStream.writeUTF(DataOutputStream.java:401)
        at java.io.DataOutputStream.writeUTF(DataOutputStream.java:323)
        at 
org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
        at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:580)
        at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:462)
        at 
org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
        at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:650)
        at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:641)
        at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:474)
        at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:462)
        at 
org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
        at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:650)
        at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:470)
        at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:462)
        at 
org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
        at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:650)
        at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:470)
        at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:40)
        at 
org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)
        at 
org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.append(IFile.java:274)
        at 
org.apache.tez.mapreduce.combine.MRCombiner$2.write(MRCombiner.java:163)
        at 
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
        at 
org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:211)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:173)
{code}

> Turn off combiner in reducer vertex for Tez if bags are in combine plan
> -----------------------------------------------------------------------
>
>                 Key: PIG-4843
>                 URL: https://issues.apache.org/jira/browse/PIG-4843
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.16.0
>
>
> {code}
> B = group A by key;
> C = foreach B {
>                                          key_value           =  A.key_value;
>                                          distinct_key_value  = DISTINCT 
> key_value;
>                                          generate group, MIN(A.key_value) as 
> min_value, MAX(A.key_value) as max_value, COUNT(distinct_key_value) as 
> distinct_values;
>                     }
> {code}
> In the above example, the combine plan holds the Distinct bag and it causes 
> OOM when combiner is run by the MergeManager in the reducer. We did not have 
> this issue with mapreduce as combiner is not running in reducer for new API 
> till now (MAPREDUCE-5221)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to