[ https://issues.apache.org/jira/browse/PIG-4657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697718#comment-14697718 ]
Rohini Palaniswamy edited comment on PIG-4657 at 8/18/15 7:58 PM: ------------------------------------------------------------------ Tested it for group by job which was grouping by a tuple consisting of three chararray fields (chararray comparison in BinInterSedesTupleRawComparator is very time consuming due to doing new String() and comparing Strings). The job which was taking 2 hrs came down to 30 mins. was (Author: rohini): Tested it for group by. A job which was taking 2 hrs came down to 30 mins. > [Pig on Tez] Optimize GroupBy and Distinct key comparison > --------------------------------------------------------- > > Key: PIG-4657 > URL: https://issues.apache.org/jira/browse/PIG-4657 > Project: Pig > Issue Type: Sub-task > Reporter: Rohini Palaniswamy > Assignee: Rohini Palaniswamy > Fix For: 0.16.0 > > Attachments: PIG-4657-1.patch > > > While bytes comparator cannot be used for joins till TEZ-2715 is > available, they can be used for group by and distinct if they have only one > Tez input. If there is more than one input due to union optimization > (OrderedGroupedMergedKVInput) , full comparator has to be still used as > OrderedGroupedMergedKVInput uses the comparator to merge the two underlying > inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)