Rohini Palaniswamy created PIG-3847:
---------------------------------------

             Summary: Sort avoidance for group by and join
                 Key: PIG-3847
                 URL: https://issues.apache.org/jira/browse/PIG-3847
             Project: Pig
          Issue Type: Sub-task
            Reporter: Rohini Palaniswamy


Group by and join only require that the records be grouped together by key. It 
is not necessary for the keys to be sorted. If we can have a Tez Input/Output 
implementation that does the grouping using hashmap (memory, spilling, etc have 
to be handled) it could really speed up group by and join. Combiners on both 
input and output side can also be fast if serialization/deserialization is not 
required and that can be used instead of POPartialAgg. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to