[
https://issues.apache.org/jira/browse/PIG-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003615#comment-13003615
]
Thejas M Nair commented on PIG-1618:
------------------------------------
bq. is this only changes describe output or does it have other non-backward
comptible changes?
It changes the schema of the statement, so any statements that relies on a non
null schema of foreach statement will not work any more.
But I think the new behavior is correct.
For example -
{code}
grunt> describe g;
g: {group: bytearray,a: {(null)}}
grunt> f = foreach g generate $0 , flatten(a);
grunt> f2 = foreach f generate group; -- this would give an error - Projected
field [group] does not exist in schema: null.
{code}
It also affects lineage tracing, in case of co-group where inputs don't have
schema.
{code}
a = load 'a' using PigStorage('a') ;
b = load 'a' using PigStorage('b') ;
c = cogroup a by $0, b by $0 ;
d = foreach c generate group, flatten(a), flatten(b) ;
e = foreach d generate $1 + 1, $2 + 1 ;
-- in 0.8 the load func spec of a would have been associated with $1, and that
of b with $2 .
{code}
This also means that some valid statements that would not work with 0.8 will
now work -
{code}
a = load 'a' using PigStorage('a') ;
b = group a by $0;
c = foreach b generate group, flatten(b);
d = foreach c generate $2;
-- in 0.8 this would have given an error - Error during parsing. Out of bound
access. Trying to access non-existent column: 2. Schema {group:
bytearray,bytearray} has 2 column(s).
-- in trunk this will work as it should
{code}
> Switch to new parser generator technology
> -----------------------------------------
>
> Key: PIG-1618
> URL: https://issues.apache.org/jira/browse/PIG-1618
> Project: Pig
> Issue Type: Improvement
> Affects Versions: 0.8.0
> Reporter: Alan Gates
> Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
> Attachments: NewParser-1.patch, NewParser-10.patch,
> NewParser-11.patch, NewParser-12.patch, NewParser-13.2.patch,
> NewParser-13.patch, NewParser-14.patch, NewParser-15.patch,
> NewParser-18.patch, NewParser-19.3.patch, NewParser-19.patch,
> NewParser-2.patch, NewParser-3.patch, NewParser-3.patch, NewParser-4.patch,
> NewParser-5.patch, NewParser-6.patch, NewParser-7.patch, NewParser-8.patches,
> NewParser-9.patch, antlr-3.2.jar, javadoc.patch
>
>
> There are many bugs in Pig related to the parser, particularly to bad error
> messages. After review of Java CC we feel these will be difficult to address
> using that tool. Also, the .jjt files used by JavaCC are hard to understand
> and maintain.
> ANTLR is being reviewed as the most likely choice to move to, but other
> parsers will be reviewed as well.
> This JIRA will act as an umbrella issue for other parser issues.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira