[ https://issues.apache.org/jira/browse/PIG-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824102#comment-15824102 ]
Travis Woodruff commented on PIG-5082: -------------------------------------- Bad news, [~rohini]. The patch for PIG-5078 (I used PIG-5078-3.patch) fixed my test above, but it didn't fix my actual script that was failing. The following still fails with the same error (added an additional split and join to the end): {code} a = LOAD '/tmp/empty.txt' USING PigStorage('\t') AS (x:chararray); b = LOAD '/tmp/empty.txt' USING PigStorage('\t') AS (x:chararray); c = LOAD '/tmp/empty.txt' USING PigStorage('\t') AS (y:chararray); u1 = UNION ONSCHEMA a, b; SPLIT u1 INTO r IF x != '', s OTHERWISE; d = JOIN r BY x LEFT, c BY y; u2 = UNION ONSCHEMA d, s; e = FILTER u2 BY x == ''; f = FILTER u2 BY x == 'm'; u3 = UNION ONSCHEMA e, f; SPLIT u3 INTO t if x != '', u OTHERWISE; v = JOIN t BY x LEFT, c BY y; DUMP v; {code} This also fails (same as previous but with limit instead of final join): {code} a = LOAD '/tmp/empty.txt' USING PigStorage('\t') AS (x:chararray); b = LOAD '/tmp/empty.txt' USING PigStorage('\t') AS (x:chararray); c = LOAD '/tmp/empty.txt' USING PigStorage('\t') AS (y:chararray); u1 = UNION ONSCHEMA a, b; SPLIT u1 INTO r IF x != '', s OTHERWISE; d = JOIN r BY x LEFT, c BY y; u2 = UNION ONSCHEMA d, s; e = FILTER u2 BY x == ''; f = FILTER u2 BY x == 'm'; u3 = UNION ONSCHEMA e, f; SPLIT u3 INTO t if x != '', u OTHERWISE; v = LIMIT t 10; DUMP t; {code} > Tez UnionOptimizer creates vertex group with one member > ------------------------------------------------------- > > Key: PIG-5082 > URL: https://issues.apache.org/jira/browse/PIG-5082 > Project: Pig > Issue Type: Bug > Components: tez > Affects Versions: 0.16.0 > Reporter: Travis Woodruff > Assignee: Rohini Palaniswamy > Priority: Minor > Fix For: 0.17.0, 0.16.1 > > Attachments: PIG-5082.patch > > > This script results in a vertex group with one member: > {code} > a = LOAD '/tmp/empty.txt' USING PigStorage('\t') AS (x:chararray); > b = LOAD '/tmp/empty.txt' USING PigStorage('\t') AS (x:chararray); > c = LOAD '/tmp/empty.txt' USING PigStorage('\t') AS (y:chararray); > u1 = UNION ONSCHEMA a, b; > SPLIT u1 INTO r IF x != '', s OTHERWISE; > d = JOIN r BY x LEFT, c BY y; > u2 = UNION ONSCHEMA d, s; > e = FILTER u2 BY x == ''; > f = FILTER u2 BY x == 'm'; > u3 = UNION ONSCHEMA e, f; > DUMP u3; > {code} > Which results in: > {code} > java.lang.IllegalArgumentException: VertexGroup must have at least 2 members > at org.apache.tez.dag.api.VertexGroup.<init>(VertexGroup.java:77) > at org.apache.tez.dag.api.DAG.createVertexGroup(DAG.java:202) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:396) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:255) > ... > {code} > This seems to be happening because {{UnionOptimizer}} is replacing a union > with a vertex group and then optimizing away a predecessor union thus > removing a node and resulting in a vertex group with one member. -- This message was sent by Atlassian JIRA (v6.3.4#6332)