[ https://issues.apache.org/jira/browse/PIG-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164235#comment-14164235 ]
Keren Ouaknine commented on PIG-4004: ------------------------------------- I submitted a new patch with: a) upgrade from the old mapred to the new mapreduce API. This change allows to express a join using one MR job rather than three (two for readings and one for the join) b) fixed (many) scaling bugs. The benchmark was using too much memory in its queries and therefore the queries wouldn't execute at scale (ie 500 million rows) > Upgrade the Pigmix queries from the (old) mapred API to mapreduce > ----------------------------------------------------------------- > > Key: PIG-4004 > URL: https://issues.apache.org/jira/browse/PIG-4004 > Project: Pig > Issue Type: Bug > Components: tools > Affects Versions: 0.12.1 > Reporter: Keren Ouaknine > Fix For: 0.15.0 > > Attachments: PIG-4004.patch > > > Until now, the Pigmix queries were written using the old mapred API. > As a result, some queries were expressed with three concatenated MR jobs > instead of one. I rewrote all the queries to match the newer mapreduce API > and optimized them on the fly. > This is a continuity work to PIG-3915. -- This message was sent by Atlassian JIRA (v6.3.4#6332)