[
https://issues.apache.org/jira/browse/PIG-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140582#comment-15140582
]
Rohini Palaniswamy commented on PIG-4790:
-----------------------------------------
The difference in the complex script is that one of the edges is a vertex
group. The fix has a problem though. It turns of UnionOptimizer for the simple
case as well where the edges are normal which the previous patch handled. We
should avoid turning off UnionOptimizer as much as possible because the
performance of UnorderedPartitionedKVOutput is currently very bad and is not
fixed yet. Would be good to add the script to TestTezCompiler as well.
> Join after union fail due to UnionOptimizer
> -------------------------------------------
>
> Key: PIG-4790
> URL: https://issues.apache.org/jira/browse/PIG-4790
> Project: Pig
> Issue Type: Bug
> Components: tez
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: 0.16.0
>
> Attachments: PIG-4790-1.patch, PIG-4790-2.patch
>
>
> The following script fail to run:
> {code}
> rmf ooo
> a = load 'student.txt' as (name:chararray, age:int, gpa:double);
> b = filter a by age > 65;
> c = filter a by age <=10;
> d = union b, c;
> e = join a by name left, d by name;
> store e into 'ooo';
> {code}
> Exception stack:
> {code}
> Caused by: java.lang.IllegalArgumentException: Edge [scope-43 :
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ->
> [scope-55 :
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({
> SCATTER_GATHER : org.apache.tez.runtime.library.input.OrderedGroupedKVInput
> >> PERSISTED >>
> org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput >>
> NullEdgeManager }) already defined!
> at org.apache.tez.dag.api.DAG.addEdge(DAG.java:272)
> at
> org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:311)
> at
> org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:252)
> at
> org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56)
> at
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> at
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.buildDAG(TezJobCompiler.java:65)
> at
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:111)
> ... 20 more
> {code}
> Disable pig.tez.opt.union the script runs fine.
> Seems we shall detect this patten and disallow merge vertex group into a pair
> already has an edge.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)