[ 
https://issues.apache.org/jira/browse/PIG-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173468#comment-13173468
 ] 

Daniel Dai commented on PIG-2259:
---------------------------------

It is semantically right if this involves a flatten. Then we need to limit the 
usage in foreach, since this is the only operator has the notion flatten. I am 
a little worry about people may misuse it, but I am open to it.
                
> Black hole of multiple level dereference on "bag in bag" structure: cannot 
> reach deeper levels
> ----------------------------------------------------------------------------------------------
>
>                 Key: PIG-2259
>                 URL: https://issues.apache.org/jira/browse/PIG-2259
>             Project: Pig
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.9.0
>         Environment: Pig 0.9.0 local version, on Linux x86 and Mac OS X 10.7.1
>            Reporter: JArod Wen
>              Labels: bag, dereference, pig
>
> I noticed that dereference cannot reach the second level of bag in a "bag in 
> bag" structure. Here is a example:
> For the following scripts:
> a = load 'grade.dat' as (name, age, gpa);
> b = load 'rate.dat' as (state, age, rate);
> ag = group a by (name, age);
> c = cogroup ag by group.age, b by age;
> cf = foreach c generate $1.$0;
> The relation c has the schema as:
> bytearray, bag{tuple(tuple(bytearray, bytearray), bag{tuple(bytearray, 
> bytearray, bytearray)})}, bag{tuple(bytearray, bytearray, bytearray)}
> so for c, $1.$0 means the first field of the bag "ag", which will be the 
> tuple group(name, age). However after this, $1.$0.$0 and $1.$0.$0.$0 keep the 
> same tuple but no deeper dereference. Actually we can add arbitrary number of 
> ".$0" after $1.$0 but keep stay at the same position. 
> The reason for this interesting "black hole" of the dereference is when we 
> dereferencing a bag, we automatically create another bag structure, so after 
> we obtain the "group(name, age)" tuple from the bag "ag", a bag wrapper is 
> added onto the tuple so it becomes
> bag{tuple(tuple(bytearray, bytearray))}
> Then no matter how many dereferences are appended, this structure cannot be 
> changed since every dereference just "takes off" the outer bag wrapper and 
> "puts on" the same bag wrapper. 
> For the same reason, the following script can also produce the same "black 
> hole":
> cf = foreach c generate $1.$1.$0. ... (arbitrary number of ".$0")

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to