[ 
https://issues.apache.org/jira/browse/PIG-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173475#comment-13173475
 ] 

JArod Wen commented on PIG-2259:
--------------------------------

Actually when I am rethinking about this problem now, I am preferring Daniel's 
opinion. 

This may be a question of whether we can assume that the bag is a typed bag or 
not. In general case, no assumption can be made to the schema within the bag, 
then in order to get inside of the bag of bag, flatten() is necessary. 

However if the parser knows that it is a typed bag, b.$0.$1 should be preferred.
                
> Black hole of multiple level dereference on "bag in bag" structure: cannot 
> reach deeper levels
> ----------------------------------------------------------------------------------------------
>
>                 Key: PIG-2259
>                 URL: https://issues.apache.org/jira/browse/PIG-2259
>             Project: Pig
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.9.0
>         Environment: Pig 0.9.0 local version, on Linux x86 and Mac OS X 10.7.1
>            Reporter: JArod Wen
>              Labels: bag, dereference, pig
>
> I noticed that dereference cannot reach the second level of bag in a "bag in 
> bag" structure. Here is a example:
> For the following scripts:
> a = load 'grade.dat' as (name, age, gpa);
> b = load 'rate.dat' as (state, age, rate);
> ag = group a by (name, age);
> c = cogroup ag by group.age, b by age;
> cf = foreach c generate $1.$0;
> The relation c has the schema as:
> bytearray, bag{tuple(tuple(bytearray, bytearray), bag{tuple(bytearray, 
> bytearray, bytearray)})}, bag{tuple(bytearray, bytearray, bytearray)}
> so for c, $1.$0 means the first field of the bag "ag", which will be the 
> tuple group(name, age). However after this, $1.$0.$0 and $1.$0.$0.$0 keep the 
> same tuple but no deeper dereference. Actually we can add arbitrary number of 
> ".$0" after $1.$0 but keep stay at the same position. 
> The reason for this interesting "black hole" of the dereference is when we 
> dereferencing a bag, we automatically create another bag structure, so after 
> we obtain the "group(name, age)" tuple from the bag "ag", a bag wrapper is 
> added onto the tuple so it becomes
> bag{tuple(tuple(bytearray, bytearray))}
> Then no matter how many dereferences are appended, this structure cannot be 
> changed since every dereference just "takes off" the outer bag wrapper and 
> "puts on" the same bag wrapper. 
> For the same reason, the following script can also produce the same "black 
> hole":
> cf = foreach c generate $1.$1.$0. ... (arbitrary number of ".$0")

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to