[ 
https://issues.apache.org/jira/browse/TEZ-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213235#comment-14213235
 ] 

Bikas Saha commented on TEZ-14:
-------------------------------

bq. If we're maintaining the speculation events via the Async dispatcher
There are multiple reasons I havent made this change. Its not yet clear that 
this is an issue. This only adds 1 new event per physically running task. Given 
the interactions between the vertex and speculator (that involve write-locks) 
its simpler to invoke this within the vertex transition. I am expecting to move 
the speculation inside the VM to potentially allow plugins to have a say in it. 
That will allow the threading addition to the vertex manager to take care of 
offloading the processing. The direct event from attempt makes sense to me 
because this is essentially an independent event of the attempt as opposed to 
other events like completion where they should be limited to the task boundary. 
For speculation, by definition, involved entities are aware of attempts. That 
being said, I am open to moving this dispatcher as a follow up if we see issues 
when experimenting with this on larger clusters. Also I am fully expecting a 
re-write of speculation after more stats are available and that would probably 
end up having a bit more complex flow. This is just porting over legacy 
speculation for legacy users trying to run Tez on large clusters where even 
basic time based speculation works ok.

bq. This is now legal because of speculation, right ? 
This should always have been legal but got exercised after speculation.

bq. It may be worthwhile to differentiate between KILLS / FAILS received due to 
speculation 
There will be no failures due to speculation. Allowing kills of other attempts 
after a leafVertex has succeeded should be fine. Failures after success should 
not be fine but like you suggest a race condition could exist. Thats orthogonal 
to speculation. Created TEZ-1779 to track this.

Will fix the rest.

> Support for speculation of slow tasks
> -------------------------------------
>
>                 Key: TEZ-14
>                 URL: https://issues.apache.org/jira/browse/TEZ-14
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: TEZ-14.1.patch, TEZ-14.2.patch, TEZ-14.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to