[ 
https://issues.apache.org/jira/browse/PIG-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063750#comment-13063750
 ] 

Zhijie Shen commented on PIG-1916:
----------------------------------

Hi Daniel,

I've completed the test cases. The slight difference between my patch and what 
you suppose to be is "1. Simple case (the one on Jira)". I didn't use this 
sample directly because it contains a command that hasn't been implemented yet. 
Instead, I modified the sample and created the simplest case:
C = cogroup user by uid, session by uid;
D = foreach C {
    crossed = cross user, session;
    generate crossed;
}

One more question is whether we need to do the same unit test in a mini-cluster 
environment (currently in local environment).

Generally, I think the patch for this issue is close to submission. How do you 
think about this?

> Nested cross
> ------------
>
>                 Key: PIG-1916
>                 URL: https://issues.apache.org/jira/browse/PIG-1916
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Daniel Dai
>            Assignee: Zhijie Shen
>              Labels: gsoc2011
>             Fix For: 0.10
>
>         Attachments: PIG-1916_1.patch, PIG-1916_2.patch, PIG-1916_3.patch, 
> PIG-1916_4.patch
>
>
> It is useful to have cross inside foreach nested statement. One typical use 
> case for nested foreach is after cogroup two relations, we want to flatten 
> the records of the same key, and do some processing. This is naturally to be 
> achieved by cross. Eg:
> {code}
> C = cogroup user by uid, session by uid;
> D = foreach C {
>     crossed = cross user, session; -- To flatten two input bags
>     filtered = filter crossed by user::region == session::region;
>     result = foreach crossed generate processSession(user::age, user::gender, 
> session::ip);  --Nested foreach Jira: PIG-1631
>     generate result;
> }
> {code}
> If we don't have cross, user have to write a UDF process the bag user, 
> session. It is much harder than a UDF process flattened tuples. This is 
> especially true when we have nested foreach statement(PIG-1631).
> This is a candidate project for Google summer of code 2011. More information 
> about the program can be found at http://wiki.apache.org/pig/GSoc2011

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to