[
https://issues.apache.org/jira/browse/PIG-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13933962#comment-13933962
]
Cheolsoo Park commented on PIG-3809:
------------------------------------
Hi Kyungho, that's a fair question. I discovered this issue while using
Lipstick that visualizes the DAG graph. Here is the exception that I ran into-
{code}
<line 1, column 138> Duplicated alias in schema: null::account_id
at
org.apache.pig.parser.QueryParserDriver.parseSchema(QueryParserDriver.java:121)
at org.apache.pig.impl.util.Utils.parseSchema(Utils.java:191)
at org.apache.pig.impl.util.Utils.getSchemaFromString(Utils.java:182)
at com.netflix.lipstick.model.Utils.processSchema(Utils.java:53)
at
com.netflix.lipstick.model.operators.P2jLogicalRelationalOperator.setSchemaString(P2jLogicalRelationalOperator.java:325)
at
com.netflix.lipstick.adaptors.LOJsonAdaptor.<init>(LOJsonAdaptor.java:72)
at
com.netflix.lipstick.adaptors.LOJoinJsonAdaptor.<init>(LOJoinJsonAdaptor.java:49)
at
com.netflix.lipstick.P2jPlanGenerator.convertNodeToAdaptor(P2jPlanGenerator.java:177)
at
com.netflix.lipstick.P2jPlanGenerator.convertNodeToP2j(P2jPlanGenerator.java:151)
at
com.netflix.lipstick.P2jPlanGenerator.<init>(P2jPlanGenerator.java:73)
at
org.apache.pig.LipstickPigServer.launchPlan(LipstickPigServer.java:136)
at
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1249)
{code}
In fact, anyone can build a tool similar to Lipstick, so I thought that it
would be better to fix the root cause in Pig.
> AddForEach optimization doesn't set the alias of the added foreach
> ------------------------------------------------------------------
>
> Key: PIG-3809
> URL: https://issues.apache.org/jira/browse/PIG-3809
> Project: Pig
> Issue Type: Bug
> Components: impl
> Reporter: Cheolsoo Park
> Assignee: Cheolsoo Park
> Fix For: 0.13.0
>
> Attachments: PIG-3809-1.patch
>
>
> AddForEach inserts a foreach operator into the plan, but it doesn't set the
> alias of added foreach. This is usually okay, but if the foreach is followed
> by a join, the missing alias confuses Pig.
> For eg, consider the following query (dummy example to demonstrate the issue)-
> {code}
> a = LOAD 'foo' AS (x, y, z);
> b = LOAD 'bar' AS (i, j, k);
> c = JOIN a BY x, b BY i;
> d = FOREACH c GENERATE a::x, b::i;
> DUMP d;
> {code}
> Without AddForEach optimization, the output schema of 'c' will be as follows-
> {code}
> a::x, a::y, a::z, b::i, b::j, b::k
> {code}
> But since 'a::y', 'a::z', 'b::j', and 'b::k' are not used in 'd', a foreach
> operator will be inserted after a and b. That is-
> {code}
> a = LOAD 'foo' AS (x, y, z);
> ? = FOREACH a GENERATE x; -- no alias is set
> b = LOAD 'bar' AS (i, j, k);
> ? = FOREACH a GENERATE i; -- no alias is set
> c = JOIN ? BY x, ? BY i;
> d = FOREACH c GENERATE ?::x, ?::i;
> DUMP d;
> {code}
> But due to missing aliases of these added foreach operators, the output
> schema of join is messed up. In fact, they show up as null, so printing the
> output schema of join gives 'null::x, null::i'.
--
This message was sent by Atlassian JIRA
(v6.2#6252)