[
https://issues.apache.org/jira/browse/PIG-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13933844#comment-13933844
]
Kyungho Jeon commented on PIG-3809:
-----------------------------------
I became aware of this a few weeks ago, but didn't know it was a bug. :)
> AddForEach optimization doesn't set the alias of the added foreach
> ------------------------------------------------------------------
>
> Key: PIG-3809
> URL: https://issues.apache.org/jira/browse/PIG-3809
> Project: Pig
> Issue Type: Bug
> Components: impl
> Reporter: Cheolsoo Park
> Assignee: Cheolsoo Park
> Fix For: 0.13.0
>
> Attachments: PIG-3809-1.patch
>
>
> AddForEach inserts a foreach operator into the plan, but it doesn't set the
> alias of added foreach. This is usually okay, but if the foreach is followed
> by a join, the missing alias confuses Pig.
> For eg, consider the following query (dummy example to demonstrate the issue)-
> {code}
> a = LOAD 'foo' AS (x, y, z);
> b = LOAD 'bar' AS (i, j, k);
> c = JOIN a BY x, b BY i;
> d = FOREACH c GENERATE a::x, b::i;
> DUMP d;
> {code}
> Without AddForEach optimization, the output schema of 'c' will be as follows-
> {code}
> a::x, a::y, a::z, b::i, b::j, b::k
> {code}
> But since 'a::y', 'a::z', 'b::j', and 'b::k' are not used in 'd', a foreach
> operator will be inserted after a and b. That is-
> {code}
> a = LOAD 'foo' AS (x, y, z);
> ? = FOREACH a GENERATE x; -- no alias is set
> b = LOAD 'bar' AS (i, j, k);
> ? = FOREACH a GENERATE i; -- no alias is set
> c = JOIN ? BY x, ? BY i;
> d = FOREACH c GENERATE ?::x, ?::i;
> DUMP d;
> {code}
> But due to missing aliases of these added foreach operators, the output
> schema of join is messed up. In fact, they show up as null, so printing the
> output schema of join gives 'null::x, null::i'.
--
This message was sent by Atlassian JIRA
(v6.2#6252)