[
https://issues.apache.org/jira/browse/PIG-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022387#comment-13022387
]
Xuefu Zhang commented on PIG-1281:
----------------------------------
+1
Had a discussion with Thejas about the root of the problem. The puzzle we have
right now is caused by two issues:
1. bag proj and tuple proj share the same syntax, so no casting can be
effectively applied.
2. bytearray is used as unknown(unspecified) type as well as real byte sequence
type.
The patch only catches the illegal projection where the type is known, and not
either tuple type or bag type.
> Detect org.apache.pig.data.DataByteArray cannot be cast to
> org.apache.pig.data.Tuple type of errors at Compile Type during creation of
> logical plan
> ---------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: PIG-1281
> URL: https://issues.apache.org/jira/browse/PIG-1281
> Project: Pig
> Issue Type: Improvement
> Affects Versions: 0.6.0
> Reporter: Viraj Bhat
> Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-1281.1.patch
>
>
> This is more of an enhancement request, where we can detect simple errors
> during compile time during creation of Logical plan rather than at the
> backend.
> I created a script which contains an error which gets detected in the backend
> as a cast error when in fact we can detect it in the front end(group is a
> single element so group.$0 projection operation will not work).
> {code}
> inputdata = LOAD '/user/viraj/mymapdata' AS (co1, col2, col3, col4);
> projdata = FILTER inputdata BY (col1 is not null);
> groupprojdata = GROUP projdata BY col1;
> cleandata = FOREACH groupprojdata {
> bagproj = projdata.col1;
> dist_bags = DISTINCT bagproj;
> GENERATE group.$0 as newcol1, COUNT(dist_bags) as
> newcol2;
> };
> cleandata1 = GROUP cleandata by newcol2;
> cleandata2 = FOREACH cleandata1 { GENERATE group.$0 as finalcol1,
> COUNT(cleandata.newcol1) as finalcol2; };
> ordereddata = ORDER cleandata2 by finalcol2;
> store into 'finalresult' using PigStorage();
> {code}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira