[
https://issues.apache.org/jira/browse/PIG-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171880#comment-14171880
]
Keren Ouaknine commented on PIG-4004:
-------------------------------------
You are right, a join can be done in a single MR job even in the old API by
using the reporter object.
Another reason I used the new API is because it's shorter code, and easier to
understand.
The MR queries in the current Pig trunk are (i) failing at scale (due to
overuse of memory), (ii) using three MR jobs to express one join, and (iii)
using an old API.
The patch solves all of the above :)
Thanks,
Keren
> Upgrade the Pigmix queries from the (old) mapred API to mapreduce
> -----------------------------------------------------------------
>
> Key: PIG-4004
> URL: https://issues.apache.org/jira/browse/PIG-4004
> Project: Pig
> Issue Type: Bug
> Components: tools
> Affects Versions: 0.12.1
> Reporter: Keren Ouaknine
> Fix For: 0.15.0
>
> Attachments: PIG-4004.patch
>
>
> Until now, the Pigmix queries were written using the old mapred API.
> As a result, some queries were expressed with three concatenated MR jobs
> instead of one. I rewrote all the queries to match the newer mapreduce API
> and optimized them on the fly.
> This is a continuity work to PIG-3915.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)