[ 
https://issues.apache.org/jira/browse/PIG-3847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3847:
------------------------------------
    Fix Version/s:     (was: 0.14.0)

> Sort avoidance for group by and join
> ------------------------------------
>
>                 Key: PIG-3847
>                 URL: https://issues.apache.org/jira/browse/PIG-3847
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>            Reporter: Rohini Palaniswamy
>
> Group by and join only require that the records be grouped together by key. 
> It is not necessary for the keys to be sorted. If we can have a Tez 
> Input/Output implementation that does the grouping using hashmap (memory, 
> spilling, etc have to be handled) it could really speed up group by and join. 
> Combiners on both input and output side can also be fast if 
> serialization/deserialization is not required and that can be used instead of 
> POPartialAgg. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to