[
https://issues.apache.org/jira/browse/PIG-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748184#comment-13748184
]
Daniel Dai commented on PIG-3379:
---------------------------------
Missing LODistinct in the posted logical plan. Should be:
{code}
|---EventsPerMinute: (Name: LOForEach Schema:
timeStamp#56:long,nbDevices#57:long,nbDevicesWatching#58:long)
| |
| (Name: LOGenerate[false,false,false] Schema:
timeStamp#56:long,nbDevices#57:long,nbDevicesWatching#58:long)ColumnPrune:InputUids=[50,
49]ColumnPrune:OutputUids=[58, 57, 56]
| | |
| | (Name: Multiply Type: long Uid: 56)
| | |
| | |---group:(Name: Project Type: long Uid: 49 Input: 0 Column:
(*))
| | |
| | |---(Name: Cast Type: long Uid: 54)
| | |
| | |---(Name: Constant Type: int Uid: 54)
| | |
| | (Name: UserFunc(org.apache.pig.builtin.BagSize) Type: long Uid:
57)
| | |
| | |---DistinctDevices:(Name: Project Type: bag Uid: 50 Input: 1
Column: (*))
| | |
| | (Name: UserFunc(org.apache.pig.builtin.BagSize) Type: long Uid:
58)
| | |
| | |---DistinctDevices:(Name: Project Type: bag Uid: 50 Input: 2
Column: (*))
| |
| |---(Name: LOInnerLoad[0] Schema: group#49:long)
| |
| |---DistinctDevices: (Name: LODistinct Schema:
deviceId#22:chararray)
| | |
| | |---1-3: (Name: LOForEach Schema: deviceId#22:chararray)
| | | |
| | | (Name: LOGenerate[false] Schema: deviceId#22:chararray)
| | | | |
| | | | deviceId:(Name: Project Type: chararray Uid: 22
Input: 0 Column: (*))
| | | |
| | | |---(Name: LOInnerLoad[1] Schema: deviceId#22:chararray)
| | |
| | |---Events: (Name: LOInnerLoad[1] Schema:
eventTime#21:long,deviceId#22:chararray,eventName#23:chararray)
{code}
The plan looks right.
Talked with [~xuefuz], the idea is to use projectedOperator instead of alias at
the time we convert alias to position. The newly introduced projectedOperator
is only used in alias translation. After that, input# and col# will be use as
the coordinates of ProjectExpression. Patch looks good. I will commit it once
tests pass.
> Alias reuse in nested foreach causes PIG script to fail
> -------------------------------------------------------
>
> Key: PIG-3379
> URL: https://issues.apache.org/jira/browse/PIG-3379
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.11.1
> Reporter: Xuefu Zhang
> Assignee: Xuefu Zhang
> Attachments: PIG-3379-draft.patch, PIG-3379.patch
>
>
> The following script fails:
> {code:title=temp.pig}
> Events = LOAD 'x' AS (eventTime:long, deviceId:chararray,
> eventName:chararray);
> Events = FOREACH Events GENERATE eventTime, deviceId, eventName;
> EventsPerMinute = GROUP Events BY (eventTime / 60000);
> EventsPerMinute = FOREACH EventsPerMinute {
> DistinctDevices = DISTINCT Events.deviceId;
> nbDevices = SIZE(DistinctDevices);
> DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat';
> nbDevicesWatching = SIZE(DistinctDevices);
> GENERATE $0*60000 as timeStamp, nbDevices as nbDevices, nbDevicesWatching
> as nbDevicesWatching;
> }
> EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0 AND timeStamp <
> 100000;
> A = FOREACH EventsPerMinute GENERATE timeStamp;
> describe A;
> {code}
> With the error:
> {code}
> 2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
> 1025:
> <file /home/xzhang/Documents/temp.pig, line 14, column 37> Invalid field
> projection. Projected field [timeStamp] does not exist in schema:
> deviceId:chararray.
> {code}
> Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As
> an observation, removing the last filter statement also fixes the problem.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira