[ 
https://issues.apache.org/jira/browse/PIG-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3618:
-------------------------------

    Attachment: PIG-3618-1.patch

https://reviews.apache.org/r/16165/

> Replace broadcast edges with scatter/gather edges in union
> ----------------------------------------------------------
>
>                 Key: PIG-3618
>                 URL: https://issues.apache.org/jira/browse/PIG-3618
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>    Affects Versions: tez-branch
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: tez-branch
>
>         Attachments: PIG-3618-1.patch
>
>
> Previously, I implemented union using OnFileUnorderedKVOutput + broadcast 
> edge. But this is a misuse of broadcast edge since union will create 
> duplicate records when parallel is set to more than 1. We should replace them 
> with ShuffledMergedInput + scatter/gather edge having the entire record as 
> key.
> Ideally, we should implement union using OnFileUnorderedKVOutput + 
> scatter/gather edge with a round robin partitioner. For now, this is not 
> supported by Tez (TEZ-661).



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to