[
https://issues.apache.org/jira/browse/PIG-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870217#comment-13870217
]
Alex Bain commented on PIG-3557:
--------------------------------
1. If the previous stage using 1 reduce, no need to add one more vertex
I think in particular this should be that if the cardinality of the previous
Tez vertex (i.e. the number of run-time tasks generated by the previous vertex)
equals 1, then there is no more need to add another vertex. Does this still
sound like an optimization that we should still implement, and if so, how would
we check such a thing?
2. If the limitplan is null (ie, not the "limited order by" case), we might not
need a shuffle edge, a pass through edge should be enough if possible
I'm reading this to mean that a sort isn't necessary on the edge. Daniel - the
way you wrote this, it sounds like you want to set the edge to be a broadcast
edge rather than a scatter-gather or one-to-one edge. However, since the
parallelism of the the final vertex that implements LIMIT is one, I don't think
any of these really make a difference (correct me if I am wrong). Thus, I'm
reading this to mean that we don't need to do any sorting on the edge (LIMIT
explicitly says that it might return any order).
For my patch, I am changing the input / output types to be
OnFileUnorderedKVOutput / ShuffledUnorderedKVInput and leaving the edge type to
SCATTER_GATHER. However, I have these lines commented out with a note that they
should be uncommented after TEZ-661. Does this sound like the right thing to do?
3. Similar to PIG-1270, we can push limit to InputHandler
This is done by the LimitOptimizer already, I have gone through and verified it.
4. We also need to think through the "limited order by" case once "order by" is
implemented
There is quite a bit of code to handle LIMIT in TezCompiler::getSortJobs. Is
this code already sufficient?
> Implement optimizations for LIMIT
> ---------------------------------
>
> Key: PIG-3557
> URL: https://issues.apache.org/jira/browse/PIG-3557
> Project: Pig
> Issue Type: Sub-task
> Components: tez
> Affects Versions: tez-branch
> Reporter: Alex Bain
> Assignee: Alex Bain
>
> Implement optimizations for LIMIT when other parts of Pig-on-Tez are more
> mature. Some of the optimizations mentioned by Daniel include:
> 1. If the previous stage using 1 reduce, no need to add one more vertex
> 2. If the limitplan is null (ie, not the "limited order by" case), we might
> not need a shuffle edge, a pass through edge should be enough if possible
> 3. Similar to PIG-1270, we can push limit to InputHandler
> 4. We also need to think through the "limited order by" case once "order by"
> is implemented
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)