[
https://issues.apache.org/jira/browse/TEZ-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213235#comment-14213235
]
Bikas Saha commented on TEZ-14:
-------------------------------
bq. If we're maintaining the speculation events via the Async dispatcher
There are multiple reasons I havent made this change. Its not yet clear that
this is an issue. This only adds 1 new event per physically running task. Given
the interactions between the vertex and speculator (that involve write-locks)
its simpler to invoke this within the vertex transition. I am expecting to move
the speculation inside the VM to potentially allow plugins to have a say in it.
That will allow the threading addition to the vertex manager to take care of
offloading the processing. The direct event from attempt makes sense to me
because this is essentially an independent event of the attempt as opposed to
other events like completion where they should be limited to the task boundary.
For speculation, by definition, involved entities are aware of attempts. That
being said, I am open to moving this dispatcher as a follow up if we see issues
when experimenting with this on larger clusters. Also I am fully expecting a
re-write of speculation after more stats are available and that would probably
end up having a bit more complex flow. This is just porting over legacy
speculation for legacy users trying to run Tez on large clusters where even
basic time based speculation works ok.
bq. This is now legal because of speculation, right ?
This should always have been legal but got exercised after speculation.
bq. It may be worthwhile to differentiate between KILLS / FAILS received due to
speculation
There will be no failures due to speculation. Allowing kills of other attempts
after a leafVertex has succeeded should be fine. Failures after success should
not be fine but like you suggest a race condition could exist. Thats orthogonal
to speculation. Created TEZ-1779 to track this.
Will fix the rest.
> Support for speculation of slow tasks
> -------------------------------------
>
> Key: TEZ-14
> URL: https://issues.apache.org/jira/browse/TEZ-14
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Bikas Saha
> Assignee: Bikas Saha
> Attachments: TEZ-14.1.patch, TEZ-14.2.patch, TEZ-14.3.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)