[
https://issues.apache.org/jira/browse/FLINK-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288227#comment-17288227
]
Till Rohrmann commented on FLINK-20332:
---------------------------------------
Sounds good to me like this as a first step.
> Add workers recovered from previous attempt to pending resources
> ----------------------------------------------------------------
>
> Key: FLINK-20332
> URL: https://issues.apache.org/jira/browse/FLINK-20332
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Coordination
> Reporter: Xintong Song
> Assignee: Xintong Song
> Priority: Major
>
> For active deployments (Native K8s/Yarn/Mesos), after a JM failover, workers
> from previous attempt should register to the new JM. Depending on the order
> that slot requests and TM registrations arrive at the RM, it could happen
> that RM allocates unnecessary new resources while there are recovered
> resources that can be reused.
> A potential improvement is to add recovered workers to pending resources, so
> that RM knows what resources are expected to be available soon and decide
> whether to allocate new resources accordingly.
> See also the discussion in FLINK-20249.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)