[
https://issues.apache.org/jira/browse/PIG-3847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Dai updated PIG-3847:
----------------------------
Fix Version/s: (was: tez-branch)
0.14.0
> Sort avoidance for group by and join
> ------------------------------------
>
> Key: PIG-3847
> URL: https://issues.apache.org/jira/browse/PIG-3847
> Project: Pig
> Issue Type: Sub-task
> Components: tez
> Reporter: Rohini Palaniswamy
> Fix For: 0.14.0
>
>
> Group by and join only require that the records be grouped together by key.
> It is not necessary for the keys to be sorted. If we can have a Tez
> Input/Output implementation that does the grouping using hashmap (memory,
> spilling, etc have to be handled) it could really speed up group by and join.
> Combiners on both input and output side can also be fast if
> serialization/deserialization is not required and that can be used instead of
> POPartialAgg.
--
This message was sent by Atlassian JIRA
(v6.2#6252)