[ 
https://issues.apache.org/jira/browse/PIG-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164235#comment-14164235
 ] 

Keren Ouaknine commented on PIG-4004:
-------------------------------------

I submitted a new patch with:
a) upgrade from the old mapred to the new mapreduce API. This change allows to 
express a join using one MR job rather than three (two for readings and one for 
the join)
b) fixed (many) scaling bugs. The benchmark was using too much memory in its 
queries and therefore the queries wouldn't execute at scale (ie 500 million 
rows)


     

> Upgrade the Pigmix queries from the (old) mapred API to mapreduce
> -----------------------------------------------------------------
>
>                 Key: PIG-4004
>                 URL: https://issues.apache.org/jira/browse/PIG-4004
>             Project: Pig
>          Issue Type: Bug
>          Components: tools
>    Affects Versions: 0.12.1
>            Reporter: Keren Ouaknine
>             Fix For: 0.15.0
>
>         Attachments: PIG-4004.patch
>
>
> Until now, the Pigmix queries were written using the old mapred API. 
> As a result, some queries were expressed with three concatenated MR jobs 
> instead of one. I rewrote all the queries to match the newer mapreduce API 
> and optimized them on the fly. 
> This is a continuity work to PIG-3915.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to