[
https://issues.apache.org/jira/browse/FLINK-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986913#comment-16986913
]
Yang Wang commented on FLINK-15036:
-----------------------------------
[~trohrmann] You are right. It should be handled in the main thread of
{{YarnResourceManager}}. Otherwise, concurrent exceptions may happen. We could
wrap all the codes of {{onStartContainerError}} into {{onStartContainerError}}
for a quick fix.
> Container startup error will be handled out side of the YarnResourceManager's
> main thread
> -----------------------------------------------------------------------------------------
>
> Key: FLINK-15036
> URL: https://issues.apache.org/jira/browse/FLINK-15036
> Project: Flink
> Issue Type: Bug
> Components: Deployment / YARN
> Affects Versions: 1.10.0, 1.8.3, 1.9.2
> Reporter: Till Rohrmann
> Priority: Critical
> Fix For: 1.10.0, 1.8.3, 1.9.2
>
>
> With FLINK-13184, we replaced the {{NMClient}} with the {{NMClientAsync}}. As
> part of this change, container start up errors are now handled by a callback
> to {{NMClientAsync.CallbackHandler}}. The implementation of
> {{NMClientAsync.CallbackHandler#onStartContainerError}} will be called by the
> {{NMClientAsync}}. Since the implementation does state changing operations,
> it needs to happen inside of the {{YarnResourceManager}} main thread.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)