[ 
https://issues.apache.org/jira/browse/PIG-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13933844#comment-13933844
 ] 

Kyungho Jeon commented on PIG-3809:
-----------------------------------

I became aware of this a few weeks ago, but didn't know it was a bug. :) 

> AddForEach optimization doesn't set the alias of the added foreach
> ------------------------------------------------------------------
>
>                 Key: PIG-3809
>                 URL: https://issues.apache.org/jira/browse/PIG-3809
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.13.0
>
>         Attachments: PIG-3809-1.patch
>
>
> AddForEach inserts a foreach operator into the plan, but it doesn't set the 
> alias of added foreach. This is usually okay, but if the foreach is followed 
> by a join, the missing alias confuses Pig.
> For eg, consider the following query (dummy example to demonstrate the issue)-
> {code}
> a = LOAD 'foo' AS (x, y, z);
> b = LOAD 'bar' AS (i, j, k);
> c = JOIN a BY x, b BY i;
> d = FOREACH c GENERATE a::x, b::i;
> DUMP d;
> {code}
> Without AddForEach optimization, the output schema of 'c' will be as follows-
> {code}
> a::x, a::y, a::z, b::i, b::j, b::k
> {code}
> But since 'a::y', 'a::z', 'b::j', and 'b::k' are not used in 'd', a foreach 
> operator will be inserted after a and b. That is-
> {code}
> a = LOAD 'foo' AS (x, y, z);
> ? = FOREACH a GENERATE x; -- no alias is set
> b = LOAD 'bar' AS (i, j, k);
> ? = FOREACH a GENERATE i; -- no alias is set
> c = JOIN ? BY x, ? BY i;
> d = FOREACH c GENERATE ?::x, ?::i;
> DUMP d;
> {code}
> But due to missing aliases of these added foreach operators, the output 
> schema of join is messed up. In fact, they show up as null, so printing the 
> output schema of join gives 'null::x, null::i'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to