[
https://issues.apache.org/jira/browse/PIG-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054266#comment-14054266
]
Siddharth Seth commented on PIG-4049:
-------------------------------------
A custom edge to route only a single event may not be sufficient - depending on
the size of the limit. A simpler approach may be to have a custom version of
the Input itself, which, instead of providing a unified view of the data, gives
access to individual chunks along with meta-information (taskId etc). This
could, additionally, be fully controlled by the user in terms of which chunks
need to be fetched.
> Improve performance of Limit following an Orderby on Tez
> --------------------------------------------------------
>
> Key: PIG-4049
> URL: https://issues.apache.org/jira/browse/PIG-4049
> Project: Pig
> Issue Type: Sub-task
> Components: tez
> Reporter: Rohini Palaniswamy
> Fix For: 0.14.0
>
>
> Better algorithms can be applied to improve performance for limit following
> an order by.
> For eg:
> {code}
> A = LOAD '/tmp/data' ...;
> B = ORDER A by $0 parallel 100;
> C = LIMIT B 100;
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)