[
https://issues.apache.org/jira/browse/PIG-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173468#comment-13173468
]
Daniel Dai commented on PIG-2259:
---------------------------------
It is semantically right if this involves a flatten. Then we need to limit the
usage in foreach, since this is the only operator has the notion flatten. I am
a little worry about people may misuse it, but I am open to it.
> Black hole of multiple level dereference on "bag in bag" structure: cannot
> reach deeper levels
> ----------------------------------------------------------------------------------------------
>
> Key: PIG-2259
> URL: https://issues.apache.org/jira/browse/PIG-2259
> Project: Pig
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.9.0
> Environment: Pig 0.9.0 local version, on Linux x86 and Mac OS X 10.7.1
> Reporter: JArod Wen
> Labels: bag, dereference, pig
>
> I noticed that dereference cannot reach the second level of bag in a "bag in
> bag" structure. Here is a example:
> For the following scripts:
> a = load 'grade.dat' as (name, age, gpa);
> b = load 'rate.dat' as (state, age, rate);
> ag = group a by (name, age);
> c = cogroup ag by group.age, b by age;
> cf = foreach c generate $1.$0;
> The relation c has the schema as:
> bytearray, bag{tuple(tuple(bytearray, bytearray), bag{tuple(bytearray,
> bytearray, bytearray)})}, bag{tuple(bytearray, bytearray, bytearray)}
> so for c, $1.$0 means the first field of the bag "ag", which will be the
> tuple group(name, age). However after this, $1.$0.$0 and $1.$0.$0.$0 keep the
> same tuple but no deeper dereference. Actually we can add arbitrary number of
> ".$0" after $1.$0 but keep stay at the same position.
> The reason for this interesting "black hole" of the dereference is when we
> dereferencing a bag, we automatically create another bag structure, so after
> we obtain the "group(name, age)" tuple from the bag "ag", a bag wrapper is
> added onto the tuple so it becomes
> bag{tuple(tuple(bytearray, bytearray))}
> Then no matter how many dereferences are appended, this structure cannot be
> changed since every dereference just "takes off" the outer bag wrapper and
> "puts on" the same bag wrapper.
> For the same reason, the following script can also produce the same "black
> hole":
> cf = foreach c generate $1.$1.$0. ... (arbitrary number of ".$0")
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira