FLATTEN, reorder columns, UNION causes uid conflict
---------------------------------------------------
Key: PIG-2465
URL: https://issues.apache.org/jira/browse/PIG-2465
Project: Pig
Issue Type: Bug
Affects Versions: 0.8.1, 0.9.1, 0.10
Reporter: David Wahler
This is a regression in the new logical plan that causes incorrect results in
0.8/0.9, and a fatal "duplicate uid in schema" error on trunk. The following
script demonstrates the problem (extracted and simplified from a much larger
script):
{code}A = LOAD 'bug.in' AS (x:{t:(x:int)}, y:{t:(y:int)});
B1 = FOREACH A GENERATE FLATTEN(x),FLATTEN(y);
B2 = FOREACH A GENERATE FLATTEN(y),FLATTEN(x);
C = UNION B1, B2;
D = GROUP C BY *;{code}
Input data:
{code}{(1)} {(2)}
{(1)} {(3)}{code}
C contains the correct data:
{code}(1,2)
(2,1)
(1,3)
(3,1){code}
D should use the entire tuple as the group key (making it essentially a
DISTINCT) but instead the output is:
{code}((1,1),{(1,2),(1,3)})
((2,2),{(2,1)})
((3,3),{(3,1)}){code}
The GROUP operation is using ($0,$0) as the key instead of ($0,$1). The logical
plan includes the line: {{C: (Name: LOUnion Schema: x::x#37:int,y::y#37:int)}}.
Switching to the old logical plan produces the correct output in 0.8, but
apparently this is no longer possible in 0.9 and later versions.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira