[
https://issues.apache.org/jira/browse/FLINK-27576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571152#comment-17571152
]
Aitozi commented on FLINK-27576:
--------------------------------
Hi [~zhisheng], I think this problem have been fixed via
https://github.com/apache/flink/pull/20256 . Could you try on that by configure
a suitable {{resourcemanager.previous-worker.recovery.timeout}}. If it works,
this ticket could be closed. Looking forward to your feedback :).
> Flink will request new pod when jm pod is delete, but will remove when
> TaskExecutor exceeded the idle timeout
> --------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-27576
> URL: https://issues.apache.org/jira/browse/FLINK-27576
> Project: Flink
> Issue Type: Bug
> Components: Deployment / Kubernetes
> Affects Versions: 1.12.0
> Reporter: zhisheng
> Priority: Major
> Attachments: image-2022-05-11-20-06-58-955.png,
> image-2022-05-11-20-08-01-739.png, jobmanager_log.txt
>
>
> flink 1.12.0 enable the ha(zk) and checkpoint, when i use kubectl delete the
> jm pod, the job will request new jm pod failover from the last checkpoint ,
> it is ok. But it will request new tm pod again, but not use actually, the
> new tm pod will closed when TaskExecutor exceeded the idle timeout . actually
> it will use the old tm, why need to request for new tm pod? whether the job
> will fail if the cluster has no resource for the new tm?Can we optimize and
> reuse the old tm directly?
>
> [^jobmanager_log.txt]
> ^!image-2022-05-11-20-06-58-955.png!^
> ^!image-2022-05-11-20-08-01-739.png|width=857,height=324!^
--
This message was sent by Atlassian Jira
(v8.20.10#820010)