Rohini Palaniswamy created PIG-4627:
---------------------------------------

             Summary: [Pig on Tez] Group by on multiple keys is slow and Self 
join does not handle null values correctly
                 Key: PIG-4627
                 URL: https://issues.apache.org/jira/browse/PIG-4627
             Project: Pig
          Issue Type: Bug
            Reporter: Rohini Palaniswamy
            Assignee: Rohini Palaniswamy
             Fix For: 0.16.0, 0.15.1


  These are issues with using slow comparators or bugs in comparators.

  Tez is using PigTupleSortComparator and mapreduce is using 
PigTupleWritableComparator on the mapside for comparing tuples.  
PigTupleSortComparator is very inefficient and makes it really slow for group 
by. 

  Self join does not produce right results in case of null after PIG-4495 which 
writes multiple inputs into same tez input. Need the 
https://issues.apache.org/jira/secure/attachment/12628162/PIG-3761-1.patch fix 
of  PIG-3761 to handle that by comparing indexes in the raw comparators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to