mazhenzz created FLINK-34197:
--------------------------------
Summary: How can i recover job by savepoint with multi-job run by
executeAsync in application mode
Key: FLINK-34197
URL: https://issues.apache.org/jira/browse/FLINK-34197
Project: Flink
Issue Type: Technical Debt
Components: API / Core
Affects Versions: 1.18.1
Reporter: mazhenzz
Hello guys, i'm working on flink java with 1.18 version, and want to use
Application-mode to run 2 jobs in one pod(k8s docker deployment).
In java code, i use a _for_ statement to create 2 or more jobs with
env.executeAsync, creating a new env in loop clause. Thus we can run multi
parallel job in one docker pod, to reduce resource cost.
In application-mode, i think i cannot take over recovery with checkpoint,
because we cannot enable HA in this mode, thus we cannot store the previous job
id in Zookeeper to recover from checkpoint. Ref:
https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/overview/#application-mode
So i want to recover by savepoint, when the docker pod is down or need to
restart. My problems are:
* how can i trigger savepoint for each job (now i run 2 jobs in one pod) every
hour?
* how can i recover from savepoint for each job when the docker pod restart?
with java code or REST api.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)