[ 
https://issues.apache.org/jira/browse/PIG-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054256#comment-14054256
 ] 

Rohini Palaniswamy commented on PIG-4049:
-----------------------------------------

[~sseth],
   Created TEZ-1264 for the OnFileSortedOutput limit optimization. For the 
other case that I mentioned about making the input fetch tasks in order, I 
don't think that is required. In that case only pulling data of task0 is 
sufficient. From what I saw yesterday a custom edge manager can be written that 
routes task0 data to the single destination task ignoring output of all other 
tasks. I did not create a Tez jira because I wanted to try writing one to see 
how that works before seeing if a Tez jira was necessary. 

> Improve performance of Limit following an Orderby on Tez
> --------------------------------------------------------
>
>                 Key: PIG-4049
>                 URL: https://issues.apache.org/jira/browse/PIG-4049
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>            Reporter: Rohini Palaniswamy
>             Fix For: 0.14.0
>
>
> Better algorithms can be applied to improve performance for limit following 
> an order by.
> For eg:
> {code}
> A = LOAD '/tmp/data' ...;
> B = ORDER A by $0 parallel 100;
> C = LIMIT B 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to