ptlrs commented on code in PR #20:
URL: https://github.com/apache/ozone-helm-charts/pull/20#discussion_r2446780472


##########
charts/ozone/templates/datanode/datanode-service-headless.yaml:
##########
@@ -28,6 +28,10 @@ spec:
   ports:
     - name: ui
       port: {{ .Values.datanode.service.port }}
+    - name: ratis-ipc
+      port: 9858
+    - name: ipc
+      port: 9859

Review Comment:
   We should use the values file to reference these ports here and elsewhere in 
the other templates.



##########
charts/ozone/templates/helm/om-manager.yaml:
##########
@@ -0,0 +1,239 @@
+{{- if or .Values.om.persistence.enabled }}
+{{- $dnodes := ternary (splitList "," (include "ozone.om.decommissioned.nodes" 
.)) (list) (ne "" (include "ozone.om.decommissioned.nodes" .)) }}
+{{- $env := concat .Values.env .Values.helm.env }}
+{{- $envFrom := concat .Values.envFrom .Values.helm.envFrom }}
+{{- $nodeSelector := or .Values.helm.nodeSelector .Values.nodeSelector }}
+{{- $affinity := or .Values.helm.affinity .Values.affinity }}
+{{- $tolerations := or .Values.helm.tolerations .Values.tolerations }}
+{{- $securityContext := or .Values.helm.securityContext 
.Values.securityContext }}
+{{- if and (gt (len $dnodes) 0) ( .Values.om.persistence.enabled) }}
+
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: {{ printf "%s-helm-manager-leader-transfer" $.Release.Name }}
+  labels:
+    {{- include "ozone.labels" $ | nindent 4 }}
+    app.kubernetes.io/component: helm-manager
+  annotations:
+    "helm.sh/hook": pre-upgrade
+    "helm.sh/hook-weight": "0"
+    "helm.sh/hook-delete-policy": hook-succeeded, hook-failed
+spec:
+  backoffLimit: {{ $.Values.helm.backoffLimit }}
+  template:
+    metadata:
+      labels:
+        {{- include "ozone.selectorLabels" $ | nindent 8 }}
+        app.kubernetes.io/component: helm-manager
+    spec:
+      containers:
+        - name: om-leader-transfer
+          image: "{{ $.Values.image.repository }}:{{ $.Values.image.tag | 
default $.Chart.AppVersion }}"
+          imagePullPolicy: {{ $.Values.image.pullPolicy }}
+          {{- with $.Values.om.command }}
+          command: {{- tpl (toYaml .) $ | nindent 12 }}
+          {{- end }}
+          args:
+            - sh
+            - -c
+            - |
+              set -e
+              exec ozone admin om transfer -id=cluster1 -n={{ $.Release.Name 
}}-om-0
+          env:
+            {{- include "ozone.configuration.env.prehook" $ | nindent 12 }}
+            {{- with $env }}
+              {{- tpl (toYaml .) $ | nindent 12 }}
+            {{- end }}
+          {{- with $envFrom }}
+          envFrom: {{- tpl (toYaml .) $ | nindent 12 }}
+          {{- end }}
+          ports:
+            - name: data-ratis-ipc
+              containerPort: 9858
+            - name: data-ipc
+              containerPort: 9859
+            - name: scm-rpc-client
+              containerPort: 9860
+            - name: scm-block-cl
+              containerPort: 9863
+            - name: scm-rpc-data
+              containerPort: 9861
+            - name: scm-ratis
+              containerPort: 9894
+            - name: scm-grpc
+              containerPort: 9895
+            - name: om-rpc
+              containerPort: 9862
+            - name: om-ratis
+              containerPort: 9872
+          volumeMounts:
+            - name: config
+              mountPath: {{ $.Values.configuration.dir }}
+            - name: om-data
+              mountPath: {{ $.Values.om.persistence.path }}
+      {{- with $nodeSelector }}
+      nodeSelector: {{- toYaml . | nindent 8 }}
+      {{- end }}
+      {{- with $affinity }}
+      affinity: {{- toYaml . | nindent 8 }}
+      {{- end }}
+      {{- with $tolerations }}
+      tolerations: {{- toYaml . | nindent 8 }}
+      {{- end }}
+      {{- with $securityContext }}
+      securityContext: {{- toYaml . | nindent 8 }}
+      {{- end }}

Review Comment:
   Do we need these for performing a leader transfer? If we want to restrict 
where this job can/should run, we should document that motivation. 



##########
charts/ozone/templates/helm/om-manager.yaml:
##########
@@ -0,0 +1,239 @@
+{{- if or .Values.om.persistence.enabled }}
+{{- $dnodes := ternary (splitList "," (include "ozone.om.decommissioned.nodes" 
.)) (list) (ne "" (include "ozone.om.decommissioned.nodes" .)) }}
+{{- $env := concat .Values.env .Values.helm.env }}
+{{- $envFrom := concat .Values.envFrom .Values.helm.envFrom }}
+{{- $nodeSelector := or .Values.helm.nodeSelector .Values.nodeSelector }}
+{{- $affinity := or .Values.helm.affinity .Values.affinity }}
+{{- $tolerations := or .Values.helm.tolerations .Values.tolerations }}
+{{- $securityContext := or .Values.helm.securityContext 
.Values.securityContext }}
+{{- if and (gt (len $dnodes) 0) ( .Values.om.persistence.enabled) }}
+
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: {{ printf "%s-helm-manager-leader-transfer" $.Release.Name }}
+  labels:
+    {{- include "ozone.labels" $ | nindent 4 }}
+    app.kubernetes.io/component: helm-manager
+  annotations:
+    "helm.sh/hook": pre-upgrade
+    "helm.sh/hook-weight": "0"
+    "helm.sh/hook-delete-policy": hook-succeeded, hook-failed
+spec:
+  backoffLimit: {{ $.Values.helm.backoffLimit }}
+  template:
+    metadata:
+      labels:
+        {{- include "ozone.selectorLabels" $ | nindent 8 }}
+        app.kubernetes.io/component: helm-manager
+    spec:
+      containers:
+        - name: om-leader-transfer
+          image: "{{ $.Values.image.repository }}:{{ $.Values.image.tag | 
default $.Chart.AppVersion }}"
+          imagePullPolicy: {{ $.Values.image.pullPolicy }}
+          {{- with $.Values.om.command }}
+          command: {{- tpl (toYaml .) $ | nindent 12 }}
+          {{- end }}
+          args:
+            - sh
+            - -c
+            - |
+              set -e
+              exec ozone admin om transfer -id=cluster1 -n={{ $.Release.Name 
}}-om-0
+          env:
+            {{- include "ozone.configuration.env.prehook" $ | nindent 12 }}
+            {{- with $env }}
+              {{- tpl (toYaml .) $ | nindent 12 }}
+            {{- end }}
+          {{- with $envFrom }}
+          envFrom: {{- tpl (toYaml .) $ | nindent 12 }}
+          {{- end }}
+          ports:
+            - name: data-ratis-ipc
+              containerPort: 9858
+            - name: data-ipc
+              containerPort: 9859
+            - name: scm-rpc-client
+              containerPort: 9860
+            - name: scm-block-cl
+              containerPort: 9863
+            - name: scm-rpc-data
+              containerPort: 9861
+            - name: scm-ratis
+              containerPort: 9894
+            - name: scm-grpc
+              containerPort: 9895
+            - name: om-rpc
+              containerPort: 9862
+            - name: om-ratis
+              containerPort: 9872

Review Comment:
   Do we need all these ports for the OM leader transfer?



##########
charts/ozone/templates/om/om-statefulset.yaml:
##########
@@ -51,9 +53,66 @@ spec:
           {{- with .Values.om.command }}
           command: {{- tpl (toYaml .) $ | nindent 12 }}
           {{- end }}
-          {{- with .Values.om.args }}
-          args: {{- tpl (toYaml .) $ | nindent 12 }}
-          {{- end }}
+          args:
+            - sh

Review Comment:
   The bootstrapping action is needed only when a pod starts the first time. 
Such logic should be moved to the initContainer which will perform this action 
and then start the main container. 
   
   Another step that can be taken here is to create a bash script and just call 
it from the initcontainer instead of having complex logic inside the helm 
templates.



##########
charts/ozone/templates/om/om-statefulset.yaml:
##########
@@ -51,9 +53,66 @@ spec:
           {{- with .Values.om.command }}
           command: {{- tpl (toYaml .) $ | nindent 12 }}
           {{- end }}
-          {{- with .Values.om.args }}
-          args: {{- tpl (toYaml .) $ | nindent 12 }}
-          {{- end }}
+          args:
+            - sh
+            - -c
+            - |
+              set -e
+              HELM_MANAGER_PATH="{{ .Values.om.persistence.path }}{{ 
.Values.helm.persistence.path }}"
+              HELM_MANAGER_BOOTSTRAPPED_FILE="$HELM_MANAGER_PATH/bootstrapped"
+              {{- $flattenedArgs := join " " $.Values.om.args }}
+              {{- if and (.Values.om.persistence.enabled) (gt (len $bnodes) 0) 
}}
+                joinArr() {
+                  local IFS=","
+                  echo "$*"
+                }
+                bootstrap_finalizer() {
+                  echo "Init bootrap finalizer process..."
+                  while true; do
+                    IFS= read -r line;
+                    echo "$line"
+                      if echo "$line" | grep -q "Successfully bootstrapped OM 
$HOSTNAME and joined the Ratis group"; then

Review Comment:
   This can be brittle. Can we rely on the exit code instead of parsing a log 
line to determine success?
   



##########
charts/ozone/templates/helm/om-manager.yaml:
##########
@@ -0,0 +1,239 @@
+{{- if or .Values.om.persistence.enabled }}
+{{- $dnodes := ternary (splitList "," (include "ozone.om.decommissioned.nodes" 
.)) (list) (ne "" (include "ozone.om.decommissioned.nodes" .)) }}
+{{- $env := concat .Values.env .Values.helm.env }}
+{{- $envFrom := concat .Values.envFrom .Values.helm.envFrom }}
+{{- $nodeSelector := or .Values.helm.nodeSelector .Values.nodeSelector }}
+{{- $affinity := or .Values.helm.affinity .Values.affinity }}
+{{- $tolerations := or .Values.helm.tolerations .Values.tolerations }}
+{{- $securityContext := or .Values.helm.securityContext 
.Values.securityContext }}
+{{- if and (gt (len $dnodes) 0) ( .Values.om.persistence.enabled) }}
+
+apiVersion: batch/v1
+kind: Job

Review Comment:
   There are 2 jobs and 1 service being created in this file. 
   
   The OM leader transfer job is related to pre-upgrade and creates 1 job. 
   The OM decommissioning job is related to post-upgrade and creates N jobs.
   
   These jobs and services should be split into multiple files. 
   
   Also the files should be renamed from `om-manager.yaml` to something more 
descriptive like `om-leader-transfer-job.yaml` and `om-decommission-job.yaml`



##########
charts/ozone/templates/helm/om-manager.yaml:
##########
@@ -0,0 +1,239 @@
+{{- if or .Values.om.persistence.enabled }}
+{{- $dnodes := ternary (splitList "," (include "ozone.om.decommissioned.nodes" 
.)) (list) (ne "" (include "ozone.om.decommissioned.nodes" .)) }}
+{{- $env := concat .Values.env .Values.helm.env }}
+{{- $envFrom := concat .Values.envFrom .Values.helm.envFrom }}
+{{- $nodeSelector := or .Values.helm.nodeSelector .Values.nodeSelector }}
+{{- $affinity := or .Values.helm.affinity .Values.affinity }}
+{{- $tolerations := or .Values.helm.tolerations .Values.tolerations }}
+{{- $securityContext := or .Values.helm.securityContext 
.Values.securityContext }}
+{{- if and (gt (len $dnodes) 0) ( .Values.om.persistence.enabled) }}
+
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: {{ printf "%s-helm-manager-leader-transfer" $.Release.Name }}
+  labels:
+    {{- include "ozone.labels" $ | nindent 4 }}
+    app.kubernetes.io/component: helm-manager
+  annotations:
+    "helm.sh/hook": pre-upgrade
+    "helm.sh/hook-weight": "0"
+    "helm.sh/hook-delete-policy": hook-succeeded, hook-failed
+spec:
+  backoffLimit: {{ $.Values.helm.backoffLimit }}
+  template:
+    metadata:
+      labels:
+        {{- include "ozone.selectorLabels" $ | nindent 8 }}
+        app.kubernetes.io/component: helm-manager
+    spec:
+      containers:
+        - name: om-leader-transfer
+          image: "{{ $.Values.image.repository }}:{{ $.Values.image.tag | 
default $.Chart.AppVersion }}"
+          imagePullPolicy: {{ $.Values.image.pullPolicy }}
+          {{- with $.Values.om.command }}
+          command: {{- tpl (toYaml .) $ | nindent 12 }}
+          {{- end }}
+          args:
+            - sh
+            - -c
+            - |
+              set -e
+              exec ozone admin om transfer -id=cluster1 -n={{ $.Release.Name 
}}-om-0

Review Comment:
   The motivation for this job is not clear to me. Are we going to run these 
jobs when we perform a `helm upgrade` ?
    
   1. Do we have an actual need to change the leader to be `om-0` before an 
upgrade?
   2. What happens if `om-0` is not functional or the leader transfer fails? It 
appears that we don't check the result. 
   3. Should the upgrade fail if the transfer fails?
   
   We should find a way to not hardcode the `cluster1` id.
   



##########
charts/ozone/templates/om/om-statefulset.yaml:
##########
@@ -51,9 +53,66 @@ spec:
           {{- with .Values.om.command }}
           command: {{- tpl (toYaml .) $ | nindent 12 }}
           {{- end }}
-          {{- with .Values.om.args }}
-          args: {{- tpl (toYaml .) $ | nindent 12 }}
-          {{- end }}
+          args:
+            - sh
+            - -c
+            - |
+              set -e
+              HELM_MANAGER_PATH="{{ .Values.om.persistence.path }}{{ 
.Values.helm.persistence.path }}"
+              HELM_MANAGER_BOOTSTRAPPED_FILE="$HELM_MANAGER_PATH/bootstrapped"
+              {{- $flattenedArgs := join " " $.Values.om.args }}
+              {{- if and (.Values.om.persistence.enabled) (gt (len $bnodes) 0) 
}}
+                joinArr() {
+                  local IFS=","
+                  echo "$*"
+                }
+                bootstrap_finalizer() {
+                  echo "Init bootrap finalizer process..."
+                  while true; do
+                    IFS= read -r line;
+                    echo "$line"
+                      if echo "$line" | grep -q "Successfully bootstrapped OM 
$HOSTNAME and joined the Ratis group"; then
+                      echo "$HOSTNAME was successfully bootstrapped!"
+                      mkdir -p "$HELM_MANAGER_PATH"
+                      touch "$HELM_MANAGER_BOOTSTRAPPED_FILE"
+                      break;
+                    fi
+                  done
+                  echo "Bootstrap finalizer process finished!"
+                  exit 0
+                }
+                bootstrapHosts="{{ join "," $bnodes }}"
+                echo "Need to handle bootstrap for nodes $bootstrapHosts"
+                IFS=',' read -r -a hostArray <<< "$bootstrapHosts"
+                doBootstrap=false
+                nodesConfigOverwriteList=()
+                for host in "${hostArray[@]}"; do
+                  if [[ "$host" == "$HOSTNAME" ]]; then
+                    doBootstrap=true
+                    activeNodesConfig="{{ join "," $activeNodes }}"
+                    IFS=',' read -r -a overwriteArray <<< "$activeNodesConfig"
+                    for overwriteHost in "${overwriteArray[@]}"; do
+                      nodesConfigOverwriteList+=("$overwriteHost")
+                      if [[ "$overwriteHost" == "$HOSTNAME" ]]; then
+                        break;
+                      fi
+                    done
+                    break
+                  fi
+                done
+                if [ "$doBootstrap" = true ] && [ ! -f 
"$HELM_MANAGER_BOOTSTRAPPED_FILE" ]; then
+                  echo "$HOSTNAME must be started with bootstrap arg!"
+                  overwriteCmd="$(joinArr "${nodesConfigOverwriteList[@]}")"
+                  echo "Bootstrapping node config for this node: $overwriteCmd"
+                  exec {{ printf "%s --set ozone.om.nodes.cluster1=" 
$flattenedArgs }}"$overwriteCmd" --bootstrap | bootstrap_finalizer

Review Comment:
   Will the cluster name always be `cluster1` or does it need to be 
parameterized?



##########
charts/ozone/values.yaml:
##########
@@ -145,6 +145,49 @@ om:
     # The name of a specific storage class name to use
     storageClassName: ~
 
+# Storage Container Manager configuration
+scm:
+  # Number of Storage Container Manager replicas
+  replicas: 3
+  # Command to launch Storage Container Manager (templated)
+  command: ~
+  # Arguments to launch Storage Container Manager (templated)
+  args: ["ozone", "scm"]
+  # Additional Storage Container Manager environment variables (templated)
+  env: []
+  # Additional Storage Container Manager envFrom items to set up environment 
variables (templated)
+  envFrom: []
+  # Storage Container Manager resource requests and limits
+  resources: {}
+  # Constrain Storage Container Manager pods to nodes with specific node labels
+  nodeSelector: {}
+  # Constrain Storage Container Manager pods to nodes by 
affinity/anti-affinity rules
+  affinity: {}

Review Comment:
   The anti-affinity should be set so that multiple SCM nodes are not running 
on the same node. 
   
   Same for OM. 



##########
charts/ozone/templates/helm/om-manager.yaml:
##########
@@ -0,0 +1,239 @@
+{{- if or .Values.om.persistence.enabled }}
+{{- $dnodes := ternary (splitList "," (include "ozone.om.decommissioned.nodes" 
.)) (list) (ne "" (include "ozone.om.decommissioned.nodes" .)) }}
+{{- $env := concat .Values.env .Values.helm.env }}
+{{- $envFrom := concat .Values.envFrom .Values.helm.envFrom }}
+{{- $nodeSelector := or .Values.helm.nodeSelector .Values.nodeSelector }}
+{{- $affinity := or .Values.helm.affinity .Values.affinity }}
+{{- $tolerations := or .Values.helm.tolerations .Values.tolerations }}
+{{- $securityContext := or .Values.helm.securityContext 
.Values.securityContext }}
+{{- if and (gt (len $dnodes) 0) ( .Values.om.persistence.enabled) }}
+
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: {{ printf "%s-helm-manager-leader-transfer" $.Release.Name }}
+  labels:
+    {{- include "ozone.labels" $ | nindent 4 }}
+    app.kubernetes.io/component: helm-manager
+  annotations:
+    "helm.sh/hook": pre-upgrade
+    "helm.sh/hook-weight": "0"
+    "helm.sh/hook-delete-policy": hook-succeeded, hook-failed
+spec:
+  backoffLimit: {{ $.Values.helm.backoffLimit }}
+  template:
+    metadata:
+      labels:
+        {{- include "ozone.selectorLabels" $ | nindent 8 }}
+        app.kubernetes.io/component: helm-manager
+    spec:
+      containers:
+        - name: om-leader-transfer
+          image: "{{ $.Values.image.repository }}:{{ $.Values.image.tag | 
default $.Chart.AppVersion }}"
+          imagePullPolicy: {{ $.Values.image.pullPolicy }}
+          {{- with $.Values.om.command }}
+          command: {{- tpl (toYaml .) $ | nindent 12 }}
+          {{- end }}
+          args:
+            - sh
+            - -c
+            - |
+              set -e
+              exec ozone admin om transfer -id=cluster1 -n={{ $.Release.Name 
}}-om-0
+          env:
+            {{- include "ozone.configuration.env.prehook" $ | nindent 12 }}
+            {{- with $env }}
+              {{- tpl (toYaml .) $ | nindent 12 }}
+            {{- end }}
+          {{- with $envFrom }}
+          envFrom: {{- tpl (toYaml .) $ | nindent 12 }}
+          {{- end }}
+          ports:
+            - name: data-ratis-ipc
+              containerPort: 9858
+            - name: data-ipc
+              containerPort: 9859
+            - name: scm-rpc-client
+              containerPort: 9860
+            - name: scm-block-cl
+              containerPort: 9863
+            - name: scm-rpc-data
+              containerPort: 9861
+            - name: scm-ratis
+              containerPort: 9894
+            - name: scm-grpc
+              containerPort: 9895
+            - name: om-rpc
+              containerPort: 9862
+            - name: om-ratis
+              containerPort: 9872
+          volumeMounts:
+            - name: config
+              mountPath: {{ $.Values.configuration.dir }}
+            - name: om-data
+              mountPath: {{ $.Values.om.persistence.path }}
+      {{- with $nodeSelector }}
+      nodeSelector: {{- toYaml . | nindent 8 }}
+      {{- end }}
+      {{- with $affinity }}
+      affinity: {{- toYaml . | nindent 8 }}
+      {{- end }}
+      {{- with $tolerations }}
+      tolerations: {{- toYaml . | nindent 8 }}
+      {{- end }}
+      {{- with $securityContext }}
+      securityContext: {{- toYaml . | nindent 8 }}
+      {{- end }}
+      volumes:
+        - name: om-data
+          emptyDir: { }
+        - name: config
+          projected:
+            sources:
+              - configMap:
+                  name: {{ $.Release.Name }}-ozone
+              {{- with $.Values.configuration.filesFrom }}
+                {{- tpl (toYaml .) $ | nindent 14 }}
+              {{- end }}
+      restartPolicy: Never
+
+{{- range $dnode := $dnodes }}
+---
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: {{ printf "%s-helm-manager-decommission-%s" $.Release.Name $dnode }}
+  labels:
+    {{- include "ozone.labels" $ | nindent 4 }}
+    app.kubernetes.io/component: helm-manager
+  annotations:
+    "helm.sh/hook": post-upgrade
+    "helm.sh/hook-weight": "0"
+    "helm.sh/hook-delete-policy": hook-succeeded, hook-failed
+spec:
+  backoffLimit: {{ $.Values.helm.backoffLimit }}
+  template:
+    metadata:
+      labels:
+        {{- include "ozone.selectorLabels" $ | nindent 8 }}
+        app.kubernetes.io/component: helm-manager
+    spec:
+      containers:
+        - name: om-decommission
+          image: "{{ $.Values.image.repository }}:{{ $.Values.image.tag | 
default $.Chart.AppVersion }}"
+          imagePullPolicy: {{ $.Values.image.pullPolicy }}
+          {{- with $.Values.om.command }}
+          command: {{- tpl (toYaml .) $ | nindent 12 }}
+          {{- end }}
+          args:
+            - sh
+            - -c
+            - |
+              set -e
+              decommission_finalizer() {
+                  echo "Init decommission finalizer process..."
+                  while true; do
+                    IFS= read -r line;
+                    echo "$line"
+                    if echo "$line" | grep -q "Successfully decommissioned OM 
{{ $dnode }}"; then
+                      echo "{{ $dnode }} was successfully decommissioned!"
+                      if [ -d /old{{ $.Values.om.persistence.path }} ]; then
+                        echo "Delete old data on pvc to enable rescheduling 
without manual PVC deletion!"
+                        rm -rf  /old{{ $.Values.om.persistence.path }}/*
+                        echo "Data deleted!"
+                      fi

Review Comment:
   What are we trying to do here?
   
   Are we just deleting all the old data in PVC? Don't we need to retain it for 
the new OM statefulsets? 
   
   If we want to delete all the data can't we use 
`persistentVolumeClaimRetentionPolicy` to clean up the PVCs when the OM 
statefulsets get scaled and/or deleted?
   
   
https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#persistentvolumeclaim-retention



##########
charts/ozone/values.yaml:
##########
@@ -145,6 +145,49 @@ om:
     # The name of a specific storage class name to use
     storageClassName: ~
 
+# Storage Container Manager configuration
+scm:
+  # Number of Storage Container Manager replicas
+  replicas: 3

Review Comment:
   We should add some validation for the number of replicas in a follow-up 
task. 1 and 3 replicas make sense. Will we allow 2, 4 or more replicas?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to