[
https://issues.apache.org/jira/browse/PIG-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934015#comment-13934015
]
Rohini Palaniswamy commented on PIG-3809:
-----------------------------------------
Wouldn't it be better to just set foreach.setAlias(op.getAlias()); instead of
foreach.setAlias(op.getAlias() + "_foreach"); ? Any reasons for adding the
_foreach? I don't know how the propagation of the change in alias is done to
steps after "d" if there were any. Would be better to check that as well if we
are changing the name of the alias.
> AddForEach optimization doesn't set the alias of the added foreach
> ------------------------------------------------------------------
>
> Key: PIG-3809
> URL: https://issues.apache.org/jira/browse/PIG-3809
> Project: Pig
> Issue Type: Bug
> Components: impl
> Reporter: Cheolsoo Park
> Assignee: Cheolsoo Park
> Fix For: 0.13.0
>
> Attachments: PIG-3809-1.patch
>
>
> AddForEach inserts a foreach operator into the plan, but it doesn't set the
> alias of added foreach. This is usually okay, but if the foreach is followed
> by a join, the missing alias confuses Pig.
> For eg, consider the following query (dummy example to demonstrate the issue)-
> {code}
> a = LOAD 'foo' AS (x, y, z);
> b = LOAD 'bar' AS (i, j, k);
> c = JOIN a BY x, b BY i;
> d = FOREACH c GENERATE a::x, b::i;
> DUMP d;
> {code}
> Without AddForEach optimization, the output schema of 'c' will be as follows-
> {code}
> a::x, a::y, a::z, b::i, b::j, b::k
> {code}
> But since 'a::y', 'a::z', 'b::j', and 'b::k' are not used in 'd', a foreach
> operator will be inserted after a and b. That is-
> {code}
> a = LOAD 'foo' AS (x, y, z);
> ? = FOREACH a GENERATE x; -- no alias is set
> b = LOAD 'bar' AS (i, j, k);
> ? = FOREACH a GENERATE i; -- no alias is set
> c = JOIN ? BY x, ? BY i;
> d = FOREACH c GENERATE ?::x, ?::i;
> DUMP d;
> {code}
> But due to missing aliases of these added foreach operators, the output
> schema of join is messed up. In fact, they show up as null, so printing the
> output schema of join gives 'null::x, null::i'.
--
This message was sent by Atlassian JIRA
(v6.2#6252)