[
https://issues.apache.org/jira/browse/TEZ-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296340#comment-14296340
]
Bikas Saha commented on TEZ-1914:
---------------------------------
The patch moves execution of vertex manager code away from the central
dispatcher. All invocations of vertexmanager code are now done on a thread pool
with callable wrapping around the invocations. A shared threadpool is used to
reduce the number of threads being created while allowing multiple
vertexmanagers to proceed in parallel. For a given vertexmanager only one
operation is active at a given time to prevent any one to monopolize the
threadpool. Recovery depended on the serialization of input information events
before init event. While this isnt theoretically necessary and something we
should look at removing, for now a new transition is allowing to maintain that
constraint. This was needed because the return of input information events is
no longer sync. Similarly, AMUserCodeException is no longer received inline
from the vertexmanager code invocation. [~hitesh] could you please look at the
recovery changes [~zjffdu] could you please look at the exception related
changes.
This is close to done. So an quick initial review would be good before
uploading the next patch.
Existing tests should cover all code paths exercised by this change.
TestVertexImpl, TestDAGImpl had to have their supporting code changed to
account for the async operations so that tests could continue to pass.
Essentially, these tests run the callables on the dispatcher, thus effectively
allowing to use dispatcher.await() to continue to guarantee completion of
internal operations.
> VertexManager logic should not run on the central dispatcher
> ------------------------------------------------------------
>
> Key: TEZ-1914
> URL: https://issues.apache.org/jira/browse/TEZ-1914
> Project: Apache Tez
> Issue Type: Task
> Reporter: Bikas Saha
> Assignee: Bikas Saha
> Attachments: TEZ-1914.1.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)