FLATTEN, reorder columns, UNION causes uid conflict
---------------------------------------------------

                 Key: PIG-2465
                 URL: https://issues.apache.org/jira/browse/PIG-2465
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.8.1, 0.9.1, 0.10
            Reporter: David Wahler


This is a regression in the new logical plan that causes incorrect results in 
0.8/0.9, and a fatal "duplicate uid in schema" error on trunk. The following 
script demonstrates the problem (extracted and simplified from a much larger 
script):

{code}A = LOAD 'bug.in' AS (x:{t:(x:int)}, y:{t:(y:int)});
B1 = FOREACH A GENERATE FLATTEN(x),FLATTEN(y);
B2 = FOREACH A GENERATE FLATTEN(y),FLATTEN(x);
C = UNION B1, B2;
D = GROUP C BY *;{code}

Input data:
{code}{(1)}     {(2)}
{(1)}   {(3)}{code}

C contains the correct data:
{code}(1,2)
(2,1)
(1,3)
(3,1){code}

D should use the entire tuple as the group key (making it essentially a 
DISTINCT) but instead the output is:
{code}((1,1),{(1,2),(1,3)})
((2,2),{(2,1)})
((3,3),{(3,1)}){code}

The GROUP operation is using ($0,$0) as the key instead of ($0,$1). The logical 
plan includes the line: {{C: (Name: LOUnion Schema: x::x#37:int,y::y#37:int)}}. 
Switching to the old logical plan produces the correct output in 0.8, but 
apparently this is no longer possible in 0.9 and later versions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to