[
https://issues.apache.org/jira/browse/TEZ-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14187992#comment-14187992
]
Siddharth Seth commented on TEZ-1714:
-------------------------------------
Processing by the components being notified should be in separate threads - so
that the main dispatcher thread (or the StateChangeNotifier thread) does not
block on the operation. As long as the calls from the statechangenotifier are
just handoffs - such a deadlock will not happen, since the lock will be
released right after the hand-off.
Pasting from TEZ-1447
bq. In terms of threading issues - there should be a note in the patch. The
'StatusNotifier' is a very lightweight call - which calls into Tez internal
components. At the moment the only registered component is the
RootInputInitializerManager. That needs to change to send notifications via a
thread, and will end up routing events via a thread as well. Similarly, when
VertexPlugins / EdgePlugins make use of this - it's their responsibility to
setup threading to send these events to the user code. At this point, the
StatusNotifier itself (and in effect the dispatcher thread) would never make
calls into user code.
StatusNotifier could set up it's own queue - but it looks like VertexManager,
EdgeManager will eventually need to run using separate threads so that calls
like onVertexStated / routeEvents don't block the dispatcher thread.
> Locking issue with StateChangeNotifier
> --------------------------------------
>
> Key: TEZ-1714
> URL: https://issues.apache.org/jira/browse/TEZ-1714
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.5.1
> Reporter: Bikas Saha
> Priority: Critical
>
> The StateChangeNotifier takes a read lock and notifies listeners using a
> direct method call. This notification could lead to the listener being
> complete. At this point, it may decide to unregister from further status
> updates and this should be allowed. However unregister tries to take a write
> lock on the StateChangeNotifier and the result is a deadlock.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)