[
https://issues.apache.org/jira/browse/PIG-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rohini Palaniswamy updated PIG-4843:
------------------------------------
Summary: Turn off combiner in reducer vertex for Tez if bags are in combine
plan (was: Turn off combiner in reducer vertex for Tez)
Stack trace for reference
{code}
ERROR 1066: Unable to open iterator for alias Final_Events. Backend error :
Vertex failed, vertexName=scope-656, vertexId=vertex_1454418370697_126672_1_03,
diagnostics=[Task failed, taskId=task_1454418370697_126672_1_03_000957,
diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running
task:org.apache.pig.backend.executionengine.ExecException: ERROR 0:
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
error in shuffle in MemtoDiskMerger [scope_614]
at
org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POShuffleTezLoad.attachInputs(POShuffleTezLoad.java:121)
at
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.initializeInputs(PigProcessor.java:332)
at
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:210)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by:
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
error in shuffle in MemtoDiskMerger [scope_614]
at
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:361)
at
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337)
... 5 more
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at java.io.DataOutputStream.writeUTF(DataOutputStream.java:401)
at java.io.DataOutputStream.writeUTF(DataOutputStream.java:323)
at
org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:580)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:462)
at
org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:650)
at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:641)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:474)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:462)
at
org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:650)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:470)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:462)
at
org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:650)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:470)
at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:40)
at
org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)
at
org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.append(IFile.java:274)
at
org.apache.tez.mapreduce.combine.MRCombiner$2.write(MRCombiner.java:163)
at
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at
org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:211)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:173)
{code}
> Turn off combiner in reducer vertex for Tez if bags are in combine plan
> -----------------------------------------------------------------------
>
> Key: PIG-4843
> URL: https://issues.apache.org/jira/browse/PIG-4843
> Project: Pig
> Issue Type: Improvement
> Reporter: Rohini Palaniswamy
> Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
>
> {code}
> B = group A by key;
> C = foreach B {
> key_value = A.key_value;
> distinct_key_value = DISTINCT
> key_value;
> generate group, MIN(A.key_value) as
> min_value, MAX(A.key_value) as max_value, COUNT(distinct_key_value) as
> distinct_values;
> }
> {code}
> In the above example, the combine plan holds the Distinct bag and it causes
> OOM when combiner is run by the MergeManager in the reducer. We did not have
> this issue with mapreduce as combiner is not running in reducer for new API
> till now (MAPREDUCE-5221)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)