Cheolsoo Park created PIG-3809:
----------------------------------

             Summary: AddForEach optimization doesn't set the alias of the 
added foreach
                 Key: PIG-3809
                 URL: https://issues.apache.org/jira/browse/PIG-3809
             Project: Pig
          Issue Type: Bug
          Components: impl
            Reporter: Cheolsoo Park
            Assignee: Cheolsoo Park
             Fix For: 0.13.0


AddForEach inserts a foreach operator into the plan, but it doesn't set the 
alias of added foreach. This is usually okay, but if the foreach is followed by 
a join, the missing alias confuses Pig.

For eg, consider the following query (dummy example to demonstrate the issue)-
{code}
a = LOAD 'foo' AS (x, y, z);
b = LOAD 'bar' AS (i, j, k);
c = JOIN a BY x, b BY i;
d = FOREACH c GENERATE a::x, b::i;
DUMP d;
{code}
Without AddForEach optimization, the output schema of 'c' will be as follows-
{code}
a::x, a::y, a::z, b::i, b::j, b::k
{code}
But since 'a::y', 'a::z', 'b::j', and 'b::k' are not used in 'd', a foreach 
operator will be inserted after a and b. That is-
{code}
a = LOAD 'foo' AS (x, y, z);
? = FOREACH a GENERATE x; -- no alias is set
b = LOAD 'bar' AS (i, j, k);
? = FOREACH a GENERATE i; -- no alias is set
c = JOIN ? BY x, ? BY i;
d = FOREACH c GENERATE ?::x, ?::i;
DUMP d;
{code}
But due to missing aliases of these added foreach operators, the output schema 
of join is messed up. In fact, they show up as null, so printing the output 
schema of join gives 'null::x, null::i'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to