[ 
https://issues.apache.org/jira/browse/TEZ-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508247#comment-14508247
 ] 

Bikas Saha commented on TEZ-1897:
---------------------------------

The last patch builds on the previous patch to actually use the concurrent 
dispatcher to run Task and TaskAttempt events concurrently. There is a 
configuration to turn this on or off and when it is turned off the code runs 
exactly the same path as it does today. So this change is very safe.

In order to keep things sane, events for a given Task and its attempts are 
serialized on the same thread. This is done by using a serializing hash 
determined from the TezTaskID. Different tasks run on different threads. That 
takes care of a lot of locking issues. Next, Vertex has reference to DAG, Task 
has reference to Vertex and Attempt has reference to Task and Vertex. This 
helps remove unnecessary locking issues and delays that occur when they are 
accessed from the AppContext to get dag/vertex/task etc. and then look-up into 
their internal maps. This change would be beneficial in general by reducing 
lock contention compared to today.

Added a simulation test that runs 50000 tasks at 1000 concurrency which runs up 
to 30% faster with the change than without.

The patch has the config turned on for patch test execution. This will be off 
by default and is marked private so only advanced users can try this for large 
clusters where we can get 10-20K running tasks concurrently.

Please review.

> Create a concurrent version of AsyncDispatcher
> ----------------------------------------------
>
>                 Key: TEZ-1897
>                 URL: https://issues.apache.org/jira/browse/TEZ-1897
>             Project: Apache Tez
>          Issue Type: Task
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: TEZ-1897.1.patch, TEZ-1897.2.patch, TEZ-1897.3.patch, 
> TEZ-1897.4.patch, TEZ-1897.5.patch, TEZ-1897.5.patch
>
>
> Currently, it processes events on a single thread. For events that can be 
> executed in parallel, e.g. vertex manager events, allowing higher concurrency 
> may be beneficial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to