[ 
https://issues.apache.org/jira/browse/TEZ-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213145#comment-14213145
 ] 

Siddharth Seth commented on TEZ-14:
-----------------------------------

If we're maintaining the speculation events via the Async dispatcher - I think 
it's better to generate speculator events directly, instead of routing them 
through the Vertex. 1) Attempts should ideally not talk directly to the Vertex, 
and instead should go via Tasks. This breaks (the unfortunately already broken) 
responsibility separation between TaskAttempt / Task / Vertex. 2) The Vertex 
isn't doing anything useful with these events, other than just forwarding them 
to the speculator. A dispatcher for the speculators should take care of this.

if (conf.get(... SPECULATION_ENABLED)) - should be a local (final) variable 
instead of looking up conf each time.

{code}
+          LOG.info("Issuing kill to other attempt " + attempt.getID() + " as 
attempt: " +
+            task.successfulAttempt + " has succeeded");
+          String diagnostics = null;
+          if (attempt.getLaunchTime() < successfulAttempt.getLaunchTime()) {
+            diagnostics = "Killed as speculative attempt : " + successTaId + " 
succeeded";
+          } else {
+            diagnostics = "Killed speculative attempt as attempt: " + 
successTaId + " succeeded";
+          }
{code}
This can be misinterpretted. It's possible for a speculated attempt to get 
killed, if a speculation after that succeeds. "Killed as speculative attempt" - 
does not mean the first attempt got killed.

bq. KillTransition was edited because it is legal for a leaf vertex task 
attempt to be killed after success but illegal for it to be failed after 
success since read errors cannot be reported for it.
This is now legal because of speculation, right ? - since multiple attempts may 
have been running, and the they need to be KILLED.
For the FAILED transition,  there can be a race where a Task succeeds - a kill 
is issued for pending attempts, but a pending attempt reports FAILURE before 
that.
It may be worthwhile to differentiate between KILLS / FAILS received due to 
speculation (in which case the check is invalid) and KILLS/FAILED received 
otherwise (in which the previous check holds). Follow up jira ?

The initial premise for this check is questionable though - since it makes 
assumptions on data being persisted from a leafVertex. That's independent of 
this jira though, and can be discussed elsewhere.

Rest looks good. Haven't looked at the speculation code in detail.

> Support for speculation of slow tasks
> -------------------------------------
>
>                 Key: TEZ-14
>                 URL: https://issues.apache.org/jira/browse/TEZ-14
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: TEZ-14.1.patch, TEZ-14.2.patch, TEZ-14.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to