Daniel Dai created PIG-4790:
-------------------------------

             Summary: Join after union fail due to UnionOptimizer
                 Key: PIG-4790
                 URL: https://issues.apache.org/jira/browse/PIG-4790
             Project: Pig
          Issue Type: Bug
          Components: tez
            Reporter: Daniel Dai
            Assignee: Daniel Dai
             Fix For: 0.16.0


The following script fail to run:
{code}
rmf ooo

a = load 'student.txt' as (name:chararray, age:int, gpa:double);
b = filter a by age > 65;
c = filter a by age <=10;
d = union b, c;
e = join a by name left, d by name;
store e into 'ooo';
{code}

Exception stack:
{code}
Caused by: java.lang.IllegalArgumentException: Edge [scope-43 : 
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> 
[scope-55 : 
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({ 
SCATTER_GATHER : org.apache.tez.runtime.library.input.OrderedGroupedKVInput >> 
PERSISTED >> org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput 
>> NullEdgeManager }) already defined!
        at org.apache.tez.dag.api.DAG.addEdge(DAG.java:272)
        at 
org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:311)
        at 
org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:252)
        at 
org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56)
        at 
org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
        at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
        at 
org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.buildDAG(TezJobCompiler.java:65)
        at 
org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:111)
        ... 20 more
{code}

Disable pig.tez.opt.union the script runs fine.

Seems we shall detect this patten and disallow merge vertex group into a pair 
already has an edge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to