[
https://issues.apache.org/jira/browse/FLINK-19545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228907#comment-17228907
]
Yang Wang commented on FLINK-19545:
-----------------------------------
Actually, the above IT cases does not contain a E2E test, including Flink CLI
job submission, kill the JobManager and check whether the Flink job could
recover from latest checkpoint successfully. It is really a basic Kubernetes HA
behavior test and could help us to keep it is always not broken.
For the jepsen tests, I am not aware of this project before and will learn more
about it. I think it makes sense to me to let it also work on Kubernetes.
> Add e2e test for native Kubernetes HA
> -------------------------------------
>
> Key: FLINK-19545
> URL: https://issues.apache.org/jira/browse/FLINK-19545
> Project: Flink
> Issue Type: Sub-task
> Components: Tests
> Reporter: Yang Wang
> Assignee: Yang Wang
> Priority: Major
> Fix For: 1.12.0
>
>
> We could use minikube for the E2E tests. Start a Flink session/application
> cluster on K8s, kill one TaskManager pod or JobManager Pod and wait for the
> job recovered from the latest checkpoint successfully.
> {panel}
> {panel}
> |{{kubectl }}{{exec}} {{-it \{pod_name} -- }}{{/bin/sh}} {{-c }}{{"kill 1"}}|
--
This message was sent by Atlassian Jira
(v8.3.4#803005)