Re: [PR] HDDS-11618. Enable HA mode for OM and SCM [ozone-helm-charts]

via GitHub Wed, 05 Nov 2025 20:55:40 -0800


ptlrs commented on code in PR #20:
URL: https://github.com/apache/ozone-helm-charts/pull/20#discussion_r2497334380



##########
charts/ozone/templates/helm/om-leader-transfer-job.yaml:
##########
@@ -0,0 +1,82 @@
+{{- if .Values.om.persistence.enabled }}
+{{- $dnodes := ternary (splitList "," (include "ozone.om.decommissioned.nodes" 
.)) (list) (ne "" (include "ozone.om.decommissioned.nodes" .)) }}
+{{- $env := concat .Values.env .Values.helm.env }}
+{{- $envFrom := concat .Values.envFrom .Values.helm.envFrom }}
+{{- $nodeSelector := or .Values.helm.nodeSelector .Values.nodeSelector }}
+{{- $affinity := or .Values.helm.affinity .Values.affinity }}
+{{- $tolerations := or .Values.helm.tolerations .Values.tolerations }}
+{{- $securityContext := or .Values.helm.securityContext 
.Values.securityContext }}
+{{- if (gt (len $dnodes) 0) }}
+---
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: {{ printf "%s-helm-manager-leader-transfer" $.Release.Name }}
+  labels:
+    {{- include "ozone.labels" $ | nindent 4 }}
+    app.kubernetes.io/component: helm-manager
+  annotations:
+    "helm.sh/hook": pre-upgrade
+    "helm.sh/hook-weight": "0"
+    "helm.sh/hook-delete-policy": hook-succeeded,hook-failed
+spec:
+  backoffLimit: {{ $.Values.helm.backoffLimit }}
+  template:
+    metadata:
+      labels:
+        {{- include "ozone.selectorLabels" $ | nindent 8 }}
+        app.kubernetes.io/component: helm-manager
+    spec:
+      containers:
+        - name: om-leader-transfer
+          image: "{{ $.Values.image.repository }}:{{ $.Values.image.tag | 
default $.Chart.AppVersion }}"
+          imagePullPolicy: {{ $.Values.image.pullPolicy }}
+          {{- with $.Values.om.command }}
+          command: {{- tpl (toYaml .) $ | nindent 12 }}
+          {{- end }}
+          args:
+            - sh
+            - -c
+            - |
+              set -e
+              exec ozone admin om transfer -id={{ $.Values.clusterId }} -n={{ 
$.Release.Name }}-om-0
+          env:

Review Comment:
   Some thoughts about this process. We should see if there is an alternate 
approach available here. 
   
   1. Transferring to om-0 assumes that om-0 is not a lagging replica. If it is 
a significantly lagging replica then we will have to go through the bootstrap 
process on om-0 which may take an indeterministic amount of time.
   2. Here we don't wait or validate if the transfer to om-0 even completed. It 
is on a best effort basis.  
   3. When we always transfer to om-0 we are introducing a dependency on one of 
our pods in the Ozone cluster. We may never be able to decommission the om-0 
pod for any reason even if we wanted to. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-11618. Enable HA mode for OM and SCM [ozone-helm-charts]

Reply via email to