[
https://issues.apache.org/jira/browse/FLINK-23849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403761#comment-17403761
]
zlzhang0122 commented on FLINK-23849:
-------------------------------------
[~trohrmann] ok, I see, maybe community have much more concern about some other
things. But IMO the auto recover strategy can't guaranty the end-to-end exactly
once if the downstream doesn't support transactional or idempotent. And support
reaction to node updates such as decommission can make yarn come to a
functional consistency just like k8s taint, also it's useful for graceful
restart of streaming job.
> Support react to the node decommissioning change state on yarn and do
> graceful restart
> --------------------------------------------------------------------------------------
>
> Key: FLINK-23849
> URL: https://issues.apache.org/jira/browse/FLINK-23849
> Project: Flink
> Issue Type: New Feature
> Components: Deployment / YARN
> Affects Versions: 1.12.2, 1.13.1, 1.13.2
> Reporter: zlzhang0122
> Priority: Major
> Fix For: 1.15.0
>
>
> Now we are not interested in node updates in
> YarnContainerEventHandler.onNodesUpdated , but sometimes we want to evict the
> running flink process on one node and graceful restart on the other node
> because of some unexpected reason such as the physical machine need to be
> recycle or the cloud computing cluster need to be migration. Thus, we can
> react to the node decommissioning change state, and call the
> stopWithSavepoint function and then restart it.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)