[
https://issues.apache.org/jira/browse/PIG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926312#action_12926312
]
Thejas M Nair commented on PIG-480:
-----------------------------------
Note that the identity map proposed in this jira is useful in cases where the
join key and group-by columns don't match .
{code}
a = load 'f' as (id, v);
b = load 's' as (id, v);
c = join a by id, b by id;
d = group c by a::v;
dump d;
{code}
> PERFORMANCE: Use identity mapper in a chain of M-R jobs
> -------------------------------------------------------
>
> Key: PIG-480
> URL: https://issues.apache.org/jira/browse/PIG-480
> Project: Pig
> Issue Type: Improvement
> Affects Versions: 0.2.0
> Reporter: Olga Natkovich
> Attachments: PIG_480.patch, PIG_480.patch, PIG_480.patch
>
>
> For jobs with two or more MR jobs, use identity mapper wherever possible in
> second and subsequent MR jobs. Identity mapper is about 50% than pig empty
> map job because it doesn't parse the data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.