Rohini Palaniswamy created PIG-4627:
---------------------------------------
Summary: [Pig on Tez] Group by on multiple keys is slow and Self
join does not handle null values correctly
Key: PIG-4627
URL: https://issues.apache.org/jira/browse/PIG-4627
Project: Pig
Issue Type: Bug
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
Fix For: 0.16.0, 0.15.1
These are issues with using slow comparators or bugs in comparators.
Tez is using PigTupleSortComparator and mapreduce is using
PigTupleWritableComparator on the mapside for comparing tuples.
PigTupleSortComparator is very inefficient and makes it really slow for group
by.
Self join does not produce right results in case of null after PIG-4495 which
writes multiple inputs into same tez input. Need the
https://issues.apache.org/jira/secure/attachment/12628162/PIG-3761-1.patch fix
of PIG-3761 to handle that by comparing indexes in the raw comparators.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)