Adrian Vasiliu created FLINK-25098:
--------------------------------------
Summary: Jobmanager CrashLoopBackOff in HA configuration
Key: FLINK-25098
URL: https://issues.apache.org/jira/browse/FLINK-25098
Project: Flink
Issue Type: Bug
Components: Deployment / Kubernetes
Affects Versions: 1.13.3, 1.13.2
Environment: Reproduced with:
* Persistent jobs storage provided by the rocks-cephfs storage class.
* OpenShift 4.9.5.
Reporter: Adrian Vasiliu
In a Kubernetes deployment of Flink 1.13.2 (also reproduced with Flink 1.13.3),
turning to Flink HA by using 3 replicas of the jobmanager leads to
CrashLoopBackoff for all replicas.
Attaching the full logs of the `jobmanager` and tls-proxy` containers of
jobmanager pod:
[^jm-flink-ha-jobmanager-log.txt]
[^jm-flink-ha-tls-proxy-log.txt]
Remarks:
* This is a follow-up of
https://issues.apache.org/jira/browse/FLINK-22014?focusedCommentId=17450524&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17450524.
* Picked Critical severity as HA is critical for our product.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)