[
https://issues.apache.org/jira/browse/PIG-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720029#comment-13720029
]
Sergey commented on PIG-1654:
-----------------------------
Hi, I'm developing and debugging my pig script locally. Suddenly I've got such
exception.
The partial code is:
{code}
routePivotsGroupedByMsisdn = GROUP routePivots BY msisdn;
markedPivots = FOREACH routePivotsGroupedByMsisdn {
ordered = ORDER routePivots BY ts;
GENERATE FLATTEN(udf.filter_route_pivots(ordered))
as (msisdn: long, --0
ts: long, --1
lac: int, --2
cid: int, --3
lon, --4
lat, --5
azimuth, --6
hpbw, --7
max_dist, --8
cell_type: chararray, --9
branch_id, --10
center_lon: double, --11
center_lat: double, --12
tile_id: int, --13
zone_col: int, --14
zone_row: int, --15
is_active, --16
not_valid);
}
SPLIT markedPivots INTO corruptedPivots if not_valid is not null, validPivots
if not_valid is null;
groupedValidPivots = GROUP validPivots BY msisdn;
pivotsWithEndPoints = FOREACH groupedValidPivots {
ordered = ORDER validPivots BY ts;
GENERATE FLATTEN(udf.mark_end_points(validPivots))
as (msisdn: long, --0
ts: long, --1
lac: int, --2
cid: int, --3
lon, --4
lat, --5
azimuth, --6
hpbw, --7
max_dist, --8
cell_type: chararray, --9
branch_id, --10
center_lon: double, --11
center_lat: double, --12
tile_id: int, --13
zone_col: int, --14
zone_row: int, --15
is_active, --16
is_end_point: boolean,
end_point_type: chararray);
}
--complains on msisdn..
projPivotsWithEndPoints = FOREACH pivotsWithEndPoints GENERATE msisdn, ts,
center_lon,
center_lat,
lac, cid,
cell_type, is_active,
is_end_point,
end_point_type;
STORE projPivotsWithEndPoints INTO '$validPivots' USING
org.apache.pig.piggybank.storage.avro.AvroStorage('index', '3', 'schema',
'{"name": "valid_pivots", "doc": "version 0.0.1", "type": "record", "fields": [
{"name": "msisdn", "type": "long"},
{"name": "ts", "type": "long"},
{"name": "center_lon", "type": "double"},
{"name": "center_lat", "type": "double"},
{"name": "lac", "type": "int"},
{"name": "cid", "type": "int"},
{"name": "cell_type", "type": "string"},
{"name": "is_active", "type": "boolean"},
{"name": "is_end_point", "type": "boolean"},
{"name": "end_point_type","type": "string"}
]}');
{code}
It complains on filed with comment "--complains on msisdn.."
If I use default
STORE pivotsWithEndPoints INTO '$validPivots';
everything works fine.
What do I do wrong?
> Pig should check schema alias duplication at any levels.
> --------------------------------------------------------
>
> Key: PIG-1654
> URL: https://issues.apache.org/jira/browse/PIG-1654
> Project: Pig
> Issue Type: Bug
> Reporter: Xuefu Zhang
> Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
>
> The following script appears valid to Pig but it shouldn't:
> A = load 'file' as (a:tuple( u:int, u:bytearray, w:long), b:int, c:chararray);
> dump A;
> Pig tries to launch map/reduce jobs for this.
> However, for the following script, Pig correctly reports error message:
> A = load 'file' as (a:int, a:long, c:bytearray);
> dump A;
> Error message is:
> 2010-09-28 15:53:37,390 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
> 1108: Duplicate schema alias: b in "A"
> Thus, Pig only checks alias duplication at the top level, which is confirmed
> by looking at the code. The right behavior is that the same check should be
> applied at all levels.
> This should be addressed in the new parser.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira