[
https://issues.apache.org/jira/browse/TEZ-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207618#comment-14207618
]
Bikas Saha commented on TEZ-14:
-------------------------------
Addressed the main comments from both reviews.
I am going to keep the event to sending the update because updates can trigger
speculations and write locking and making direct invocations from
TaskAttempt/Task/Vertex or vice versa is not going to be safe. It will also
have to poll all running attempts at regular intervals etc. If the queue on the
AM is backed up then speculation is probably not the first thing to worry
about. Even before the update event, the event from the heartbeat handler to
the attempt itself will be stuck. The code is written to be safe against
separating the dispatchers for vertices/task/attempts. So when we make that
change we should be ok. Since this update is essentially per attempt and not
per task, its seems correct to send from attempt instead of going via task.
Looks like the speculation/estimation code works even with progress=0 because
in that case the calculations end up looking at completed runtimes. Added
testcase for both progress updates and progress=0. Both work but likely we will
have to look at real executions on the cluster to see if this has any other
issues.
KillTransition was edited because it is legal for a leaf vertex task attempt to
be killed after success but illegal for it to be failed after success since
read errors cannot be reported for it.
DAGImpl is returning a TaskImpl. So it should be fine since they both are in
Impl land and not the interface land.
The porting of the code is trying to keep the ported code changes minimal so
that we can track and compare against MR while debugging issues etc. So keeping
the code structure. Also not making the values configurable since they arent in
MR and probably tuned. If needed we can make them configurable later on.
Added more tests for TaskAttempt and DataStatistics test is ported over.
> Support for speculation of slow tasks
> -------------------------------------
>
> Key: TEZ-14
> URL: https://issues.apache.org/jira/browse/TEZ-14
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Bikas Saha
> Assignee: Bikas Saha
> Attachments: TEZ-14.1.patch, TEZ-14.2.patch, TEZ-14.3.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)