[ 
https://issues.apache.org/jira/browse/TEZ-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381196#comment-14381196
 ] 

Alok Asok commented on TEZ-2103:
--------------------------------

Hi 

So I had a doubt regarding this Short circuit mechanism. Does the Vertex 
manager keep checking the state of the application through heartbeats till the 
limit condition is met?
If so does it send some specially structured message to the scheduler to close 
the rest of the sibling task and set their flag a success? How is this ordering 
done exactly? I was going in through the Tez native umbilical communication 
protocol and didnt know where to look for specifics.

Thanks
Alok Asok

> Implement a Partial completion VertexManagerPlugin
> --------------------------------------------------
>
>                 Key: TEZ-2103
>                 URL: https://issues.apache.org/jira/browse/TEZ-2103
>             Project: Apache Tez
>          Issue Type: New Feature
>            Reporter: Gopal V
>              Labels: gsoc, gsoc2015, hadoop, java, tez
>
> Currently, there is no sibling communication between tasks - this implies 
> that a task can be completed by the first vertex in a wave of tasks, but the 
> entire wave of tasks has to complete before success can be reported.
> This occurs in limit + filter query patterns common between the data access 
> engines.
> {code}
> select * from data where x > 1 limit 10;
> {code}
> will run through a full-table scan worth of tasks to generate 10 rows per 
> task, to aggregate it to produce the final 10 row result.
> The VertexManager receives counters/events early enough to short-circuit the 
> rest of the vertex tasks, to prevent the remainder of tasks from getting 
> scheduled when the limit condition has been satisfied by an initial sub-set 
> of the tasks.
> This is a specialization of the VertexManagerPlugin for this common case 
> scheduling pattern.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to