[ 
https://issues.apache.org/jira/browse/PIG-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3814:
------------------------------------

    Status: Patch Available  (was: Open)

Rank implementation in Tez is different from MR implementation.
  * MR Implementation has 1 map-only job (POCounter) which sets the Current 
taskId at position 0 of tuple and local map task counter at position 1. It also 
emits job Counters for the number of records in that map task. 
JobControlCompiler collects those, calculate offsets and launches the next map 
only job (PORank) with those offset information in the jobconf. 
  * Tez Implementation has 3 vertices. Vertex 1 outputs tuples from POCounter 
to Vertex 3. It also outputs the counters to Vertex 2 which calculates the 
offsets and broadcasts it to Vertex 3.

Common (MR and Tez) Perf optimizations made:
   - Changed taskid to be Integer instead of String to reduce memory overhead.
   - POCounter sets the Current taskId at position 0 of tuple and counter at 
position 1. PORank create a new tuple of size-1 to remove the task id and 
copies over the rest which is lot of overhead. Setting the task id as the last 
element of tuple and removing that from arraylist instead of doing a copy. 

> Implement RANK in Tez
> ---------------------
>
>                 Key: PIG-3814
>                 URL: https://issues.apache.org/jira/browse/PIG-3814
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: tez-branch
>
>         Attachments: PIG-3814-1.patch, PIG-3814-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to