[ 
https://issues.apache.org/jira/browse/PIG-3847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-3847:
----------------------------

    Fix Version/s:     (was: tez-branch)
                   0.14.0

> Sort avoidance for group by and join
> ------------------------------------
>
>                 Key: PIG-3847
>                 URL: https://issues.apache.org/jira/browse/PIG-3847
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>            Reporter: Rohini Palaniswamy
>             Fix For: 0.14.0
>
>
> Group by and join only require that the records be grouped together by key. 
> It is not necessary for the keys to be sorted. If we can have a Tez 
> Input/Output implementation that does the grouping using hashmap (memory, 
> spilling, etc have to be handled) it could really speed up group by and join. 
> Combiners on both input and output side can also be fast if 
> serialization/deserialization is not required and that can be used instead of 
> POPartialAgg. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to