[
https://issues.apache.org/jira/browse/FLINK-25893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495307#comment-17495307
]
Xintong Song edited comment on FLINK-25893 at 2/21/22, 10:55 AM:
-----------------------------------------------------------------
Fixed via
- master (1.15): cb478fb751dbe28405152707040f9126b5a5269b
- release-1.14: 451c5aa98b516bc7dde2dedfe01a6d3ae8d9c8dd
was (Author: xintongsong):
Fixed via
- master (1.15): cb478fb751dbe28405152707040f9126b5a5269b
- release-1.14: waiting ci
> ResourceManagerServiceImpl's lifecycle can lead to exceptions
> -------------------------------------------------------------
>
> Key: FLINK-25893
> URL: https://issues.apache.org/jira/browse/FLINK-25893
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.15.0, 1.14.3
> Reporter: Till Rohrmann
> Assignee: Xintong Song
> Priority: Critical
> Labels: pull-request-available
> Fix For: 1.15.0, 1.14.4
>
>
> The {{ResourceManagerServiceImpl}} lifecycle can lead to exceptions when
> calling {{ResourceManagerServiceImpl.deregisterApplication}}. The problem
> arises when the {{DispatcherResourceManagerComponent}} is shutdown before the
> {{ResourceManagerServiceImpl}} gains leadership or while it is starting the
> {{ResourceManager}}.
> One problem is that {{deregisterApplication}} returns an exceptionally
> completed future if there is no leading {{ResourceManager}}.
> Another problem is that if there is a leading {{ResourceManager}}, then it
> can still be the case that it has not been started yet. If this is the case,
> then
> [ResourceManagerGateway.deregisterApplication|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManagerServiceImpl.java#L143]
> will be discarded. The reason for this behaviour is that we create a
> {{ResourceManager}} in one {{Runnable}} and only start it in another. Due to
> this there can be the {{deregisterApplication}} call that gets the {{lock}}
> in between.
> I'd suggest to correct the lifecycle and contract of the
> {{ResourceManagerServiceImpl.deregisterApplication}}.
> Please note that due to this problem, the error reporting of this method has
> been suppressed. See FLINK-25885 for more details.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)