Argo CD health check for FlinkDeployment

Xingcan Cui Wed, 16 Nov 2022 07:20:27 -0800

Hi all,

We are exploring Argo CD to manage `FlinkDeployment` resources but noticed
that the health checking for it doesn't work properly.


To give you some context, Argo CD uses Lua scripts to check some
state-related fields and map them to three status values: "Healthy",
"Progressing" and "Degraded". The current implementation
<https://github.com/argoproj/argo-cd/pull/9300> uses some legacy fields
(e.g., status.reconciliationStatus.success) that have been removed
<https://github.com/apache/flink-kubernetes-operator/pull/165/files#diff-77c3de65b7bd2db04eeeae370a85cec77f7d7eb22ef801ef11305ede88cb315a>
a long time ago. Thus users will always get the "Progressing" status.

To fix the issue, we plan to re-implement the health checking logic. Got
three questions here.

1. Is it reasonable to simply use "obj.status.jobStatus.state" as the
indicator, i.e., map "running" to "Healthy", map "Failing" and "Failed" to
"Degraded" and map the remaining states to "Progressing"?
2. I know the Flink-K8s-operator project is still in active development.
Given that the health checking logic is coupled with the state fields, I'm
curious if they are stable now.
3. Can we apply the same logic to "FlinkSessionJob"?

Thanks,
Xingcan

Argo CD health check for FlinkDeployment

Reply via email to