[ 
https://issues.apache.org/jira/browse/PIG-4657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697718#comment-14697718
 ] 

Rohini Palaniswamy edited comment on PIG-4657 at 8/18/15 7:58 PM:
------------------------------------------------------------------

Tested it for group by job which was grouping by a tuple consisting of three 
chararray fields (chararray comparison in BinInterSedesTupleRawComparator is 
very time consuming due to doing new String() and comparing Strings). The job 
which was taking 2 hrs came down to 30 mins.


was (Author: rohini):
Tested it for group by. A job which was taking 2 hrs came down to 30 mins.

> [Pig on Tez] Optimize GroupBy and Distinct key comparison
> ---------------------------------------------------------
>
>                 Key: PIG-4657
>                 URL: https://issues.apache.org/jira/browse/PIG-4657
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.16.0
>
>         Attachments: PIG-4657-1.patch
>
>
>    While bytes comparator cannot be used for joins till TEZ-2715 is 
> available, they can be used for group by and distinct if they have only one 
> Tez input. If there is more than one input due to union optimization 
> (OrderedGroupedMergedKVInput) , full comparator has to be still used as 
> OrderedGroupedMergedKVInput uses the comparator to merge the two underlying 
> inputs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to