rakeshadr commented on code in PR #20:
URL: https://github.com/apache/ozone-helm-charts/pull/20#discussion_r2631983830


##########
charts/ozone/templates/datanode/datanode-statefulset.yaml:
##########
@@ -65,6 +65,10 @@ spec:
           ports:
             - name: ui
               containerPort: {{ .Values.datanode.service.port }}
+            - name: ratis-ipc
+              containerPort: {{ .Values.datanode.service.ratisIpcPort }}
+            - name: ipc
+              containerPort: {{ .Values.datanode.service.ipcPort }}
           livenessProbe:

Review Comment:
   Presently, only livenessProbe is configured which is okay with non-HA but I 
think, HA deployments may take longer initialization time. How about adding 
startupProbe & readinessProbe for HA deployments to handle longer 
initialization ?
   
   Probably, you can raise a follow-up task to implement this. Below is sample 
reference,
   
   For OM
   ```
   startupProbe:
     exec:
       command:
         - sh
         - -c
         - |
           # Check if OM has joined the Ratis ring
           ozone admin om roles -id {{ .Values.clusterId }} 2>/dev/null | grep 
-q $(hostname) && exit 0 || exit 1
     initialDelaySeconds: 30
     periodSeconds: 10
     failureThreshold: 30  # 5 minutes for bootstrap
     timeoutSeconds: 5
   
   readinessProbe:
     exec:
       command:
         - sh
         - -c
         - |
           # Check if OM is in LEADER or FOLLOWER state
           ozone admin om roles -id {{ .Values.clusterId }} 2>/dev/null | grep 
$(hostname) | grep -qE 'LEADER|FOLLOWER' && exit 0 || exit 1
     initialDelaySeconds: 10
     periodSeconds: 10
     timeoutSeconds: 5
     failureThreshold: 3
   ```
   
   For SCM:
   ```
   startupProbe:
     exec:
       command:
         - sh
         - -c
         - |
           # Check if SCM has joined the Ratis ring
           ozone admin scm roles 2>/dev/null | grep -q $(hostname -f) && exit 0 
|| exit 1
     initialDelaySeconds: 20
     periodSeconds: 10
     failureThreshold: 30
     timeoutSeconds: 5
   
   readinessProbe:
     exec:
       command:
         - sh
         - -c
         - |
           # Check if SCM is in LEADER or FOLLOWER state and not in SAFE_MODE
           ozone admin scm roles 2>/dev/null | grep $(hostname -f) | grep -qE 
'LEADER|FOLLOWER' && exit 0 || exit 1
     initialDelaySeconds: 10
     periodSeconds: 10
     timeoutSeconds: 5
     failureThreshold: 3
   ```



##########
charts/ozone/templates/om/om-service-headless.yaml:
##########
@@ -28,6 +28,8 @@ spec:
   ports:
     - name: ui
       port: {{ .Values.om.service.port }}
+    - name: ratis
+      port: {{ .Values.om.service.ratisPort }}

Review Comment:
   The RPC port (9862) is missing from the headless service, but it's exposed 
in the container (line 92 of om-statefulset.yaml). This port is used for client 
connections, isn't it? Have you performed ozone client file write/read 
operations and was that working fine without this?
   ```
   ports:
     - name: rpc
       port: 9862
   ```
     
   ```
   - name: rpc
       port: {{ .Values.om.service.rpcPort }}
   ```



##########
charts/ozone/templates/scm/scm-statefulset.yaml:
##########
@@ -31,6 +31,7 @@ metadata:
     app.kubernetes.io/component: scm
 spec:
   replicas: {{ .Values.scm.replicas }}
+  podManagementPolicy: Parallel

Review Comment:
   
https://github.com/apache/ozone-helm-charts/blob/c0fb376573ec6822b19c7e0814896f979f2969d7/charts/ozone/templates/scm/scm-statefulset.yaml#L43
   
   OM includes dynamic environment config in checksum but SCM doesn't. Means, 
SCM pods won't get restarted on the config changes. Seems inconsistent to me.
   
   Since ozone.configuration.env generates dynamic config based on replica 
changes, both OM & SCM should restart, right?



##########
charts/ozone/templates/om/om-bootstrap-configmap.yaml:
##########
@@ -0,0 +1,78 @@
+{{- if and .Values.om.persistence.enabled (gt (len (ternary (splitList "," 
(include "ozone.om.bootstrap.nodes" .)) (list) (ne "" (include 
"ozone.om.bootstrap.nodes" .)))) 0) }}
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: {{ .Release.Name }}-om-bootstrap-script
+  labels:
+    {{- include "ozone.labels" . | nindent 4 }}
+    app.kubernetes.io/component: om
+data:
+  om-bootstrap.sh: |-
+    #!/bin/sh
+    set -e
+
+    HELM_MANAGER_PATH="{{ .Values.om.persistence.path }}{{ 
.Values.helm.persistence.path }}"
+    HELM_MANAGER_BOOTSTRAPPED_FILE="$HELM_MANAGER_PATH/bootstrapped"
+
+    # These are templated from Helm
+    OZONE_OM_ARGS_LIST="{{- range .Values.om.args }} {{ . }} {{- end }}"
+    OZONE_OM_BOOTSTRAP_NODES="{{ include "ozone.om.bootstrap.nodes" . }}"
+    OZONE_OM_CLUSTER_IDS="{{ include "ozone.om.cluster.ids" . }}"
+    OZONE_CLUSTER_ID="{{ .Values.clusterId }}"
+
+    if [ -z "$OZONE_OM_BOOTSTRAP_NODES" ]; then
+      echo "No bootstrap handling needed!"
+      exit 0
+    fi
+
+    joinArr() {
+      local IFS=","
+      echo "$*"
+    }
+

Review Comment:
   Network issues or some glitches might cause transient failures. Adding retry 
logic helps to improve reliability. 
   
   Below is an example code with simple retry, but if you can extend this 
snippet to "exponential retry logic" with minimal code changes then it would be 
good.
   
   ```
   run_bootstrap() {
     local overwriteCmd="$1"
     local max_attempts=3
     local attempt=1
     
     while [ $attempt -le $max_attempts ]; do
       echo "Bootstrap attempt $attempt of $max_attempts"
       
       if ozone admin om --set "ozone.om.nodes.$OZONE_CLUSTER_ID=$overwriteCmd" 
--bootstrap; then
         echo "$HOSTNAME was successfully bootstrapped!"
         mkdir -p "$HELM_MANAGER_PATH"
         touch "$HELM_MANAGER_BOOTSTRAPPED_FILE"
         exit 0
       else
         echo "Bootstrap failed with exit code $?, attempt $attempt"
         attempt=$((attempt + 1))
         [ $attempt -le $max_attempts ] && sleep 10
       fi
     done
     
     echo "Bootstrap failed after $max_attempts attempts"
     exit 1
   }
   ```



##########
charts/ozone/templates/om/om-statefulset.yaml:
##########
@@ -39,25 +40,45 @@ spec:
   template:
     metadata:
       annotations:
-        checksum/config: {{ include (print $.Template.BasePath 
"/ozone-configmap.yaml") . | sha256sum }}
+        checksum/config: {{ include (print $.Template.BasePath 
"/ozone-configmap.yaml") . | cat (include "ozone.configuration.env" .) | 
sha256sum }}
       labels:
         {{- include "ozone.selectorLabels" . | nindent 8 }}
         app.kubernetes.io/component: om
     spec:
+      {{- if and .Values.om.persistence.enabled (gt (len $bnodes) 0) }}

Review Comment:
   During pod termination, OM/SCM stops immediately. This may interrupt Ratis 
log replication and could cause uncommitted transactions to be lost, isn't it?. 
Can we think of _preStop_ hook or _graceful termination period_ configured. 
   
   If you think so, then this also can be implemented in a follow-up task as it 
requires some additional test efforts.
   
   Below is sample snippet, its only for a quick reference.
   ```
   spec:
     template:
       spec:
         terminationGracePeriodSeconds: 120  # Allow time for graceful shutdown
         containers:
           - name: om
             lifecycle:
               preStop:
                 exec:
                   command:
                     - sh
                     - -c
                     - |
                       # Transfer leadership if this is the leader
                       LEADER=$(ozone admin om roles -id {{ .Values.clusterId 
}} | grep LEADER | awk '{print $1}')
                       if [ "$LEADER" = "$(hostname)" ]; then
                         echo "This OM is leader, transferring leadership..."
                         ozone admin om transfer -id {{ .Values.clusterId }} -r 
|| true
                         sleep 10
                       fi
                       # Graceful shutdown
                       kill -TERM 1
                       sleep 30
   ```



##########
charts/ozone/templates/_helpers.tpl:
##########
@@ -51,26 +51,166 @@ app.kubernetes.io/instance: {{ .Release.Name }}
   {{- $pods | join "," }}
 {{- end }}
 
-{{/* Common configuration environment variables */}}
-{{- define "ozone.configuration.env" -}}
+{{/* List of comma separated om ids */}}
+{{- define "ozone.om.cluster.ids" -}}
+  {{- $pods := list }}
+  {{- $replicas := .Values.om.replicas | int }}
+  {{- range $i := until $replicas }}
+    {{- $pods = append $pods (printf "%s-om-%d" $.Release.Name $i) }}
+  {{- end }}
+  {{- $pods | join "," }}
+{{- end }}
+
+{{/* List of comma separated scm ids */}}
+{{- define "ozone.scm.cluster.ids" -}}
+  {{- $pods := list }}
+  {{- $replicas := .Values.scm.replicas | int }}
+  {{- range $i := until $replicas }}
+    {{- $pods = append $pods (printf "%s-scm-%d" $.Release.Name $i) }}
+  {{- end }}
+  {{- $pods | join "," }}
+{{- end }}
+
+{{/* List of decommission om nodes */}}
+{{- define "ozone.om.decommissioned.nodes" -}}
+    {{- $nodes := list }}
+    {{- $statefulset := lookup "apps/v1" "StatefulSet" $.Release.Namespace 
(printf "%s-om" $.Release.Name) -}}
+    {{- if $statefulset }}
+      {{- $oldCount := $statefulset.spec.replicas | int -}}
+      {{- $newCount := .Values.om.replicas | int }}
+        {{- range $i := until $oldCount }}
+          {{- $minCount := max $newCount 1 -}}
+          {{- if ge $i $minCount }}
+            {{- $nodes = append $nodes (printf "%s-om-%d" $.Release.Name $i) }}
+          {{- end }}
+        {{- end }}
+    {{- end }}
+    {{- $nodes | join "," }}
+{{- end }}
+
+{{/* List of bootstrap om nodes */}}
+{{- define "ozone.om.bootstrap.nodes" -}}
+    {{- $nodes := list }}
+    {{- $statefulset := lookup "apps/v1" "StatefulSet" $.Release.Namespace 
(printf "%s-om" $.Release.Name) -}}
+    {{- if $statefulset }}
+      {{- $oldCount := $statefulset.spec.replicas | int -}}
+      {{- $newCount := .Values.om.replicas | int }}
+        {{- range $i := until $newCount }}
+          {{- if ge $i $oldCount }}
+            {{- $nodes = append $nodes (printf "%s-om-%d" $.Release.Name $i) }}
+          {{- end }}
+        {{- end }}
+    {{- end }}
+    {{- $nodes | join ","}}
+{{- end }}
+
+{{/* List of decommission scm nodes */}}
+{{- define "ozone.scm.decommissioned.nodes" -}}
+    {{- $nodes := list }}
+    {{- $statefulset := lookup "apps/v1" "StatefulSet" $.Release.Namespace 
(printf "%s-scm" $.Release.Name) -}}
+    {{- if $statefulset }}
+      {{- $oldCount := $statefulset.spec.replicas | int -}}
+      {{- $newCount := .Values.scm.replicas | int }}
+        {{- range $i := until $oldCount }}
+          {{- if ge $i $newCount }}
+            {{- $nodes = append $nodes (printf "%s-scm-%d" $.Release.Name $i) 
}}
+          {{- end }}
+        {{- end }}
+    {{- end }}
+    {{- $nodes | join "," -}}
+{{- end }}
+
+{{/* List of decommission data nodes */}}
+{{- define "ozone.data.decommissioned.hosts" -}}
+    {{- $hosts := list }}
+    {{- $statefulset := lookup "apps/v1" "StatefulSet" $.Release.Namespace 
(printf "%s-datanode" $.Release.Name) -}}
+    {{- if $statefulset }}
+      {{- $oldCount := $statefulset.spec.replicas | int -}}
+      {{- $newCount := .Values.datanode.replicas | int }}
+        {{- range $i := until $oldCount }}
+          {{- if ge $i $newCount }}
+            {{- $hosts = append $hosts (printf 
"%s-datanode-%d.%s-datanode-headless.%s.svc.cluster.local" $.Release.Name $i 
$.Release.Name $.Release.Namespace) }}
+          {{- end }}
+        {{- end }}
+    {{- end }}
+    {{- $hosts | join "," -}}
+{{- end }}
+
+{{- define "ozone.configuration.env.common" -}}
 - name: OZONE-SITE.XML_hdds.datanode.dir
   value: /data/storage
 - name: OZONE-SITE.XML_ozone.scm.datanode.id.dir
   value: /data/metadata
 - name: OZONE-SITE.XML_ozone.metadata.dirs
   value: /data/metadata
-- name: OZONE-SITE.XML_ozone.scm.block.client.address
-  value: {{ include "ozone.scm.pods" . }}
-- name: OZONE-SITE.XML_ozone.scm.client.address
-  value: {{ include "ozone.scm.pods" . }}
-- name: OZONE-SITE.XML_ozone.scm.names
-  value: {{ include "ozone.scm.pods" . }}
-- name: OZONE-SITE.XML_ozone.om.address
-  value: {{ include "ozone.om.pods" . }}
+- name: OZONE-SITE.XML_ozone.scm.ratis.enable
+  value: "true"
+- name: OZONE-SITE.XML_ozone.scm.service.ids
+  value: {{ .Values.clusterId }}
+- name: OZONE-SITE.XML_ozone.scm.nodes.{{ .Values.clusterId }}
+  value: {{ include "ozone.scm.cluster.ids" . }}
+  {{/*- name: OZONE-SITE.XML_ozone.scm.skip.bootstrap.validation*/}}
+  {{/*  value: {{ quote .Values.scm.skipBootstrapValidation }}*/}}
+{{- range $i, $val := until ( .Values.scm.replicas | int ) }}
+- name: {{ printf "OZONE-SITE.XML_ozone.scm.address.%s.%s-scm-%d" 
$.Values.clusterId $.Release.Name $i }}
+  value: {{ printf "%s-scm-%d.%s-scm-headless.%s.svc.cluster.local" 
$.Release.Name $i $.Release.Name $.Release.Namespace }}
+{{- end }}
+- name: OZONE-SITE.XML_ozone.scm.primordial.node.id
+  value: {{ printf "%s-scm-0" $.Release.Name }}
+- name: OZONE-SITE.XML_ozone.om.ratis.enable
+  value: "true"
+- name: OZONE-SITE.XML_ozone.om.service.ids
+  value: {{ .Values.clusterId }}
 - name: OZONE-SITE.XML_hdds.scm.safemode.min.datanode
   value: "3"
 - name: OZONE-SITE.XML_ozone.datanode.pipeline.limit
   value: "1"
 - name: OZONE-SITE.XML_dfs.datanode.use.datanode.hostname
   value: "true"
 {{- end }}
+
+{{/* Common configuration environment variables */}}
+{{- define "ozone.configuration.env" -}}
+{{- $bOmNodes := ternary (splitList "," (include "ozone.om.bootstrap.nodes" 
.)) (list) (ne "" (include "ozone.om.bootstrap.nodes" .)) }}
+{{- $dOmNodes := ternary (splitList "," (include 
"ozone.om.decommissioned.nodes" .)) (list) (ne "" (include 
"ozone.om.decommissioned.nodes" .)) }}
+{{- $activeOmNodes := ternary (splitList "," (include "ozone.om.cluster.ids" 
.)) (list) (ne "" (include "ozone.om.cluster.ids" .)) }}
+{{ include "ozone.configuration.env.common" . }}
+{{- if gt (len $dOmNodes) 0 }}
+{{- $decomIds := $dOmNodes | join "," }}
+- name: OZONE-SITE.XML_ozone.om.decommissioned.nodes.{{ .Values.clusterId }}
+  value: {{ $decomIds }}
+{{- else}}
+- name: OZONE-SITE.XML_ozone.om.decommissioned.nodes.{{ .Values.clusterId }}
+  value: ""
+{{- end }}
+- name: OZONE-SITE.XML_ozone.om.nodes.{{ .Values.clusterId }}
+  value: {{ $activeOmNodes | join "," }}
+{{- range $tempId := $activeOmNodes }}
+- name: {{ printf "OZONE-SITE.XML_ozone.om.address.%s.%s" $.Values.clusterId 
$tempId }}
+  value: {{ printf "%s.%s-om-headless.%s.svc.cluster.local" $tempId 
$.Release.Name $.Release.Namespace }}
+{{- end }}
+{{- range $tempId := $dOmNodes }}
+- name: {{ printf "OZONE-SITE.XML_ozone.om.address.%s.%s" $.Values.clusterId 
$tempId }}
+  value: {{ printf "%s-helm-manager-decommission-%s-svc.%s.svc.cluster.local" 
$.Release.Name $tempId $.Release.Namespace }}
+{{- end }}
+{{- end }}
+
+{{/* Common configuration environment variables for pre hook */}}
+{{- define "ozone.configuration.env.prehook" -}}
+{{- $bOmNodes := ternary (splitList "," (include "ozone.om.bootstrap.nodes" 
.)) (list) (ne "" (include "ozone.om.bootstrap.nodes" .)) }}
+{{- $dOmNodes := ternary (splitList "," (include 
"ozone.om.decommissioned.nodes" .)) (list) (ne "" (include 
"ozone.om.decommissioned.nodes" .)) }}
+{{- $activeOmNodes := ternary (splitList "," (include "ozone.om.cluster.ids" 
.)) (list) (ne "" (include "ozone.om.cluster.ids" .)) }}
+{{- $allOmNodes := concat $activeOmNodes $dOmNodes }}
+{{ include "ozone.configuration.env.common" . }}
+- name: OZONE-SITE.XML_ozone.om.decommissioned.nodes.{{ .Values.clusterId }}
+  value: ""
+{{- range $tempId := $allOmNodes }}
+- name: {{ printf "OZONE-SITE.XML_ozone.om.address.%s.%s" $.Values.clusterId 
$tempId }}
+  value: {{ printf "%s.%s-om-headless.%s.svc.cluster.local" $tempId 
$.Release.Name $.Release.Namespace }}
+{{- end }}
+{{ $allOmNodes = append $allOmNodes "om-leader-transfer"}}
+- name: OZONE-SITE.XML_ozone.om.nodes.{{ .Values.clusterId }}
+  value: {{ $allOmNodes | join "," }}
+- name: "OZONE-SITE.XML_ozone.om.address.{{ .Values.clusterId 
}}.om-leader-transfer"
+  value: localhost
+{{- end }}

Review Comment:
   Its good to consider ratis configurations to handle network partitions or 
slow networks. Probably, you can raise a follow-up task to find out the 
necessary ratis configurations and add it.
   
   ```
   1.  Ratis Timeouts for Network Resilience
   
   2. Heartbeat Settings
   
   3. Leader Election
   ```



##########
charts/ozone/templates/helm/om-leader-transfer-job.yaml:
##########
@@ -0,0 +1,82 @@
+{{- if .Values.om.persistence.enabled }}
+{{- $dnodes := ternary (splitList "," (include "ozone.om.decommissioned.nodes" 
.)) (list) (ne "" (include "ozone.om.decommissioned.nodes" .)) }}
+{{- $env := concat .Values.env .Values.helm.env }}
+{{- $envFrom := concat .Values.envFrom .Values.helm.envFrom }}
+{{- $nodeSelector := or .Values.helm.nodeSelector .Values.nodeSelector }}
+{{- $affinity := or .Values.helm.affinity .Values.affinity }}
+{{- $tolerations := or .Values.helm.tolerations .Values.tolerations }}
+{{- $securityContext := or .Values.helm.securityContext 
.Values.securityContext }}
+{{- if (gt (len $dnodes) 0) }}
+---
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: {{ printf "%s-helm-manager-leader-transfer" $.Release.Name }}
+  labels:
+    {{- include "ozone.labels" $ | nindent 4 }}
+    app.kubernetes.io/component: helm-manager
+  annotations:
+    "helm.sh/hook": pre-upgrade
+    "helm.sh/hook-weight": "0"
+    "helm.sh/hook-delete-policy": hook-succeeded,hook-failed
+spec:
+  backoffLimit: {{ $.Values.helm.backoffLimit }}
+  template:
+    metadata:
+      labels:
+        {{- include "ozone.selectorLabels" $ | nindent 8 }}
+        app.kubernetes.io/component: helm-manager
+    spec:
+      containers:
+        - name: om-leader-transfer
+          image: "{{ $.Values.image.repository }}:{{ $.Values.image.tag | 
default $.Chart.AppVersion }}"
+          imagePullPolicy: {{ $.Values.image.pullPolicy }}
+          {{- with $.Values.om.command }}
+          command: {{- tpl (toYaml .) $ | nindent 12 }}
+          {{- end }}
+          args:
+            - sh
+            - -c
+            - |
+              set -e
+              exec ozone admin om transfer -id={{ $.Values.clusterId }} -n={{ 
$.Release.Name }}-om-0
+          env:
+            {{- include "ozone.configuration.env.prehook" $ | nindent 12 }}
+            {{- with $env }}
+              {{- tpl (toYaml .) $ | nindent 12 }}
+            {{- end }}
+          {{- with $envFrom }}
+          envFrom: 
+            {{- tpl (toYaml .) $ | nindent 12 }}
+          {{- end }}
+          ports:
+            - name: om-rpc
+              containerPort: {{ $.Values.om.service.rpcPort }}
+            - name: om-ratis
+              containerPort: {{ $.Values.om.service.ratisPort }}

Review Comment:
   Ratis port is exposed even when om.replicas = 1 (non-HA mode). In 
single-node mode, Ratis is not needed.
   
   Suggestion:
   `{{- if gt (int .Values.om.replicas) 1 }}`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to