[
https://issues.apache.org/jira/browse/PIG-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171703#comment-13171703
]
Jonathan Coveney commented on PIG-2259:
---------------------------------------
I actually think I get what Jarod means, and agree. Let's say you have a bag
b:bag{t:tuple(x:int, b:bag{t:tuple(a:int,b:int,c:int)})}
It'd be nice to be able to do
b.$0.$1 in order to grab that inner bag. You could, alternately, do b.$0,
flatten it, then access the $0 field, but that is way more clunky.
I'll look around and see how hard this would be too do (probably not terribly
difficult), the question is more whether we should support this (and I would
say we should).
> Black hole of multiple level dereference on "bag in bag" structure: cannot
> reach deeper levels
> ----------------------------------------------------------------------------------------------
>
> Key: PIG-2259
> URL: https://issues.apache.org/jira/browse/PIG-2259
> Project: Pig
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.9.0
> Environment: Pig 0.9.0 local version, on Linux x86 and Mac OS X 10.7.1
> Reporter: JArod Wen
> Labels: bag, dereference, pig
>
> I noticed that dereference cannot reach the second level of bag in a "bag in
> bag" structure. Here is a example:
> For the following scripts:
> a = load 'grade.dat' as (name, age, gpa);
> b = load 'rate.dat' as (state, age, rate);
> ag = group a by (name, age);
> c = cogroup ag by group.age, b by age;
> cf = foreach c generate $1.$0;
> The relation c has the schema as:
> bytearray, bag{tuple(tuple(bytearray, bytearray), bag{tuple(bytearray,
> bytearray, bytearray)})}, bag{tuple(bytearray, bytearray, bytearray)}
> so for c, $1.$0 means the first field of the bag "ag", which will be the
> tuple group(name, age). However after this, $1.$0.$0 and $1.$0.$0.$0 keep the
> same tuple but no deeper dereference. Actually we can add arbitrary number of
> ".$0" after $1.$0 but keep stay at the same position.
> The reason for this interesting "black hole" of the dereference is when we
> dereferencing a bag, we automatically create another bag structure, so after
> we obtain the "group(name, age)" tuple from the bag "ag", a bag wrapper is
> added onto the tuple so it becomes
> bag{tuple(tuple(bytearray, bytearray))}
> Then no matter how many dereferences are appended, this structure cannot be
> changed since every dereference just "takes off" the outer bag wrapper and
> "puts on" the same bag wrapper.
> For the same reason, the following script can also produce the same "black
> hole":
> cf = foreach c generate $1.$1.$0. ... (arbitrary number of ".$0")
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira