[jira] [Updated] (FLINK-25098) Jobmanager CrashLoopBackOff in HA configuration

Adrian Vasiliu (Jira) Mon, 29 Nov 2021 12:57:05 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-25098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Adrian Vasiliu updated FLINK-25098:
-----------------------------------
    Description: 
In a Kubernetes deployment of Flink 1.13.2 (also reproduced with Flink 1.13.3), 
turning to Flink HA by using 3 replicas of the jobmanager leads to 
CrashLoopBackoff for all replicas.

Attaching the full logs of the `jobmanager` and tls-proxy` containers of 
jobmanager pod:
[^jm-flink-ha-jobmanager-log.txt]
[^jm-flink-ha-tls-proxy-log.txt]

Remarks:
 * This is a follow-up of 
https://issues.apache.org/jira/browse/FLINK-22014?focusedCommentId=17450524&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17450524.
 
 * Picked Critical severity as HA is critical for our product.

  was:
In a Kubernetes deployment of Flink 1.13.2 (also reproduced with Flink 1.13.3), 
turning to Flink HA by using 3 replicas of the jobmanager leads to 
CrashLoopBackoff for all replicas.

Attaching the full logs of the `jobmanager` and tls-proxy` containers of 
jobmanager pod:
[^jm-flink-ha-jobmanager-log.txt]
[^jm-flink-ha-tls-proxy-log.txt]

Remarks:
* This is a follow-up of 
https://issues.apache.org/jira/browse/FLINK-22014?focusedCommentId=17450524&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17450524.
 
* Picked Critical severity as HA is critical for our product.


> Jobmanager CrashLoopBackOff in HA configuration
> -----------------------------------------------
>
>                 Key: FLINK-25098
>                 URL: https://issues.apache.org/jira/browse/FLINK-25098
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.13.2, 1.13.3
>         Environment: Reproduced with:
> * Persistent jobs storage provided by the rocks-cephfs storage class.
> * OpenShift 4.9.5.
>            Reporter: Adrian Vasiliu
>            Priority: Critical
>         Attachments: jm-flink-ha-jobmanager-log.txt, 
> jm-flink-ha-tls-proxy-log.txt
>
>
> In a Kubernetes deployment of Flink 1.13.2 (also reproduced with Flink 
> 1.13.3), turning to Flink HA by using 3 replicas of the jobmanager leads to 
> CrashLoopBackoff for all replicas.
> Attaching the full logs of the `jobmanager` and tls-proxy` containers of 
> jobmanager pod:
> [^jm-flink-ha-jobmanager-log.txt]
> [^jm-flink-ha-tls-proxy-log.txt]
> Remarks:
>  * This is a follow-up of 
> https://issues.apache.org/jira/browse/FLINK-22014?focusedCommentId=17450524&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17450524.
>  
>  * Picked Critical severity as HA is critical for our product.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-25098) Jobmanager CrashLoopBackOff in HA configuration

Reply via email to