[
https://issues.apache.org/jira/browse/PIG-3847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rohini Palaniswamy updated PIG-3847:
------------------------------------
Fix Version/s: (was: 0.14.0)
> Sort avoidance for group by and join
> ------------------------------------
>
> Key: PIG-3847
> URL: https://issues.apache.org/jira/browse/PIG-3847
> Project: Pig
> Issue Type: Sub-task
> Components: tez
> Reporter: Rohini Palaniswamy
>
> Group by and join only require that the records be grouped together by key.
> It is not necessary for the keys to be sorted. If we can have a Tez
> Input/Output implementation that does the grouping using hashmap (memory,
> spilling, etc have to be handled) it could really speed up group by and join.
> Combiners on both input and output side can also be fast if
> serialization/deserialization is not required and that can be used instead of
> POPartialAgg.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)