[jira] [Commented] (PIG-4049) Improve performance of Limit following an Orderby on Tez

Rohini Palaniswamy (JIRA) Mon, 07 Jul 2014 16:26:24 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054282#comment-14054282
 ]


Rohini Palaniswamy commented on PIG-4049:
-----------------------------------------

Actually what I said in the earlier comment about pulling just data of task0 is 
sufficient is not right. I had forgotten the actual reasoning I talked about a 
input that fetches from tasks in order in the initial comment instead of custom 
routing. We do perform limit on each of the tasks from task0 to taskn. But say 
if the limit was 10K and if task0 produced 10K or more records it is fine. But 
if the task0 produced less than 10K records then we need to pull the rest of 
the records from task1 and others. I will go ahead and create the Tez jira for 
that. 

> Improve performance of Limit following an Orderby on Tez
> --------------------------------------------------------
>
>                 Key: PIG-4049
>                 URL: https://issues.apache.org/jira/browse/PIG-4049
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>            Reporter: Rohini Palaniswamy
>             Fix For: 0.14.0
>
>
> Better algorithms can be applied to improve performance for limit following 
> an order by.
> For eg:
> {code}
> A = LOAD '/tmp/data' ...;
> B = ORDER A by $0 parallel 100;
> C = LIMIT B 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PIG-4049) Improve performance of Limit following an Orderby on Tez

Reply via email to