[ 
https://issues.apache.org/jira/browse/TEZ-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321162#comment-14321162
 ] 

Gopal V commented on TEZ-2103:
------------------------------

The handling of short-circuit success + exit in an out-of-order scheduler is 
complex enough that it should track the evolution of the scheduler.

The fact that tasks which never had a container or attempt will be marked as a 
SUCCESS operation is complex enough to be tested within Tez (+ Tez UI / failure 
tolerance etc).

The conditional part of the decision is user-code, which is the less complex 
part of this.

> Implement a Partial completion VertexManagerPlugin
> --------------------------------------------------
>
>                 Key: TEZ-2103
>                 URL: https://issues.apache.org/jira/browse/TEZ-2103
>             Project: Apache Tez
>          Issue Type: New Feature
>            Reporter: Gopal V
>
> Currently, there is no sibling communication between tasks - this implies 
> that a task can be completed by the first vertex in a wave of tasks, but the 
> entire wave of tasks has to complete before success can be reported.
> This occurs in limit + filter query patterns common between the data access 
> engines.
> {code}
> select * from data where x > 1 limit 10;
> {code}
> will run through a full-table scan worth of tasks to generate 10 rows per 
> task, to aggregate it to produce the final 10 row result.
> The VertexManager receives counters/events early enough to short-circuit the 
> rest of the vertex tasks, to prevent the remainder of tasks from getting 
> scheduled when the limit condition has been satisfied by an initial sub-set 
> of the tasks.
> This is a specialization of the VertexManagerPlugin for this common case 
> scheduling pattern.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to