[ 
https://issues.apache.org/jira/browse/TEZ-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844220#comment-16844220
 ] 

Ahmed Hussein commented on TEZ-4067:
------------------------------------

A concurrent Async dispatcher was added in TEZ-1897 . By default the 
AsyncDispatcher is disabled.

In order to enable the concurrentDispatcher, the TezConfiguration needs to pass 
{noformat}
-Dtez.am.use.concurrent-dispatcher=true  {noformat}
 

 
 # The AsynDispatcher may not be ideal for production because each 
Task/TaskAttmept implies notify event on the blocking queue. For status-updates 
it may be faster to do the update within one thread rather than calling a new 
event between two threads.
 # The frequency of events could overwhelm the pool-workers, and events won't 
be processed on time.
 # For both synchronous and Asynchronous dispatcher, there is no mechanism to 
prevent two different workers scanning the vertex tasks. In that case, workers 
would duplicate the work without any productivity.

 

Suggested fix

 
 # Keep the asyncDispatcher disabled.
 # In legacySpeculator, remove "maybeSpeculate" from 
"notifyAttemptStatusUpdate()". This will prevent the event handler from 
executing the main speculation loop.
 # Create a thread per speculator to execute " maybeSpeculate" every 
"soonestRetryAfterSpeculate/soonestRetryAfterNoSpeculate" 

 

> Tez Speculation decision is calculated on each update by the dispatcher
> -----------------------------------------------------------------------
>
>                 Key: TEZ-4067
>                 URL: https://issues.apache.org/jira/browse/TEZ-4067
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Ahmed Hussein
>            Assignee: Ahmed Hussein
>            Priority: Minor
>
> LegacySpeculator is an object field in VertexImpl. Therefore, all events are 
> handled synchronously by the caller (dispatcher). This implies the following:
>  # the dispatcher spends long time executing updateStatus as it needs to 
> check the runtime estimation of the tezAttempts within the vertex.
>  # the speculator is per stage: lunching a speculation may not the optimum 
> decision. Ideally, based on resources, speculated tasks should be the ones 
> with slowest progress.
>  # the time between speculation is skewed because there is a big delay for 
> the dispatcher to complete a full cycle. Also, speculation will be more 
> aggressive compared to MR because MR waits for 
> "soonest.retry.after.speculate" whenever a task is speculated. On the other 
> hand, Tez speculates more tasks as it processes stages in parallel.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to