[ https://issues.apache.org/jira/browse/TEZ-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321162#comment-14321162 ]
Gopal V commented on TEZ-2103: ------------------------------ The handling of short-circuit success + exit in an out-of-order scheduler is complex enough that it should track the evolution of the scheduler. The fact that tasks which never had a container or attempt will be marked as a SUCCESS operation is complex enough to be tested within Tez (+ Tez UI / failure tolerance etc). The conditional part of the decision is user-code, which is the less complex part of this. > Implement a Partial completion VertexManagerPlugin > -------------------------------------------------- > > Key: TEZ-2103 > URL: https://issues.apache.org/jira/browse/TEZ-2103 > Project: Apache Tez > Issue Type: New Feature > Reporter: Gopal V > > Currently, there is no sibling communication between tasks - this implies > that a task can be completed by the first vertex in a wave of tasks, but the > entire wave of tasks has to complete before success can be reported. > This occurs in limit + filter query patterns common between the data access > engines. > {code} > select * from data where x > 1 limit 10; > {code} > will run through a full-table scan worth of tasks to generate 10 rows per > task, to aggregate it to produce the final 10 row result. > The VertexManager receives counters/events early enough to short-circuit the > rest of the vertex tasks, to prevent the remainder of tasks from getting > scheduled when the limit condition has been satisfied by an initial sub-set > of the tasks. > This is a specialization of the VertexManagerPlugin for this common case > scheduling pattern. -- This message was sent by Atlassian JIRA (v6.3.4#6332)