[ 
https://issues.apache.org/jira/browse/PIG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926312#action_12926312
 ] 

Thejas M Nair commented on PIG-480:
-----------------------------------

Note that the identity map proposed in this jira is useful in cases where the 
join key and group-by columns don't match . 
{code}
a = load 'f' as (id, v);
b = load 's' as (id, v);
c = join a by id, b by id;
d = group c by a::v;
dump d;
{code}

> PERFORMANCE: Use identity mapper in a chain of M-R jobs
> -------------------------------------------------------
>
>                 Key: PIG-480
>                 URL: https://issues.apache.org/jira/browse/PIG-480
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: PIG_480.patch, PIG_480.patch, PIG_480.patch
>
>
> For jobs with two or more MR jobs, use identity mapper wherever possible in 
> second and subsequent MR jobs. Identity mapper is about 50% than pig empty 
> map job because it doesn't parse the data. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to