[ 
https://issues.apache.org/jira/browse/PIG-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3959:
-------------------------------

    Attachment: PIG-3959-1.patch

The problem is as follows-
* If replicated join happens in the same vertex as in skewed join, 3 edges are 
connected to the join vertex: 2 for skewed join input tables and 1 for 
replicated join input table.
* Since {{TezDagBuilder}} blindly sets the intermediate input/output key of all 
the inbound edges to {{NullablePartitionWritable}}, the input/output key of 
replicated join input edge is also set to {{NullablePartitionWritable}}.
* This causes an exception in POLocalRearrangeTez.java:
{code}
wrong key class: class org.apache.pig.impl.io.NullableIntWritable is not class 
org.apache.pig.impl.io.NullablePartitionWritable
{code}

In the attached patch, I changed the condition in {{TezDagBuilder}} to apply 
{{NullablePartitionWritable}} only if {{isSkewedJoin && isConnectedToPackage}}.

> Skewed join followed by replicated join fails in Tez
> ----------------------------------------------------
>
>                 Key: PIG-3959
>                 URL: https://issues.apache.org/jira/browse/PIG-3959
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>    Affects Versions: tez-branch
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.14.0
>
>         Attachments: PIG-3959-1.patch
>
>
> To reproduce the issue, run the following query-
> {code}
> x = LOAD 'foo' AS (x:int, y:chararray);
> y = LOAD 'bar' AS (x:int, y:chararray);
> a = JOIN x BY x, y BY x USING 'skewed';
> z = LOAD 'zoo' AS (x:int, y:chararray);
> b = JOIN a BY x::x, z BY x USING 'replicated';
> DUMP b;
> {code}
> This fails at runtime with the following error-
> {code}
> : Container released by application, 
> AttemptID:attempt_1399657418038_0357_1_04_000000_3 Info:Error: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 2135: Received 
> error from POLocalRearrage function.wrong key class: class 
> org.apache.pig.impl.io.NullableIntWritable is not class 
> org.apache.pig.impl.io.NullablePartitionWritable
> : at 
> org.apache.pig.backend.hadoop.executionengine.tez.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:175)
> : at 
> org.apache.pig.backend.hadoop.executionengine.tez.PigProcessor.runPipeline(PigProcessor.java:276)
> : at 
> org.apache.pig.backend.hadoop.executionengine.tez.PigProcessor.run(PigProcessor.java:175)
> : at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307)
> : at org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:581)
> : at java.security.AccessController.doPrivileged(Native Method)
> : at javax.security.auth.Subject.doAs(Subject.java:415)
> : at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> : at org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:570)
> : Caused by: java.io.IOException: wrong key class: class 
> org.apache.pig.impl.io.NullableIntWritable is not class 
> org.apache.pig.impl.io.NullablePartitionWritable
> : at 
> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.append(IFile.java:212)
> : at 
> org.apache.tez.runtime.library.broadcast.output.FileBasedKVWriter.write(FileBasedKVWriter.java:149)
> : at 
> org.apache.pig.backend.hadoop.executionengine.tez.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:160)
> : ... 8 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to