Copilot commented on code in PR #17946: URL: https://github.com/apache/pinot/pull/17946#discussion_r2977522263
########## helm/pinot/values.yaml: ########## @@ -754,17 +754,18 @@ minionStateless: # ------------------------------------------------------------------------------ zookeeper: - ## If true, install the Zookeeper chart alongside Pinot - ## ref: https://github.com/bitnami/charts/tree/master/bitnami/zookeeper + ## If true, install the ZooKeeper StatefulSet alongside Pinot + ## Uses the official Apache ZooKeeper Docker image + ## ref: https://hub.docker.com/_/zookeeper enabled: true - ## If the Zookeeper Chart is disabled a URL override is required to connect + ## If the ZooKeeper chart is disabled a URL override is required to connect Review Comment: The comment says “If the ZooKeeper chart is disabled…”, but ZooKeeper is no longer a subchart after this migration. Consider rewording to “If ZooKeeper is disabled…” to match the new deployment model and avoid confusion. ```suggestion ## If ZooKeeper is disabled a URL override is required to connect ``` ########## helm/pinot/UPGRADING.md: ########## @@ -0,0 +1,124 @@ +<!-- + + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +--> + +# Upgrading the Pinot Helm Chart + +## From 0.x to 1.0.0 + +Version 1.0.0 replaces the Bitnami ZooKeeper subchart with native Helm +templates using the [official Apache ZooKeeper Docker image](https://hub.docker.com/_/zookeeper). +This is a **breaking change** that requires manual intervention when upgrading +existing deployments. + +### Why is this breaking? + +1. **Immutable StatefulSet selectors**: The Bitnami chart uses + `app.kubernetes.io/name: zookeeper` labels, while the new templates use + `app: pinot, component: zookeeper`. StatefulSet selectors cannot be changed + in-place, so `helm upgrade` will fail. + +2. **Different data paths**: The Bitnami image stores data at + `/bitnami/zookeeper`, while the official image uses `/data`. Even though the + PVC name (`data`) is the same, the new container will not find existing data. + +3. **Removed Bitnami-specific values**: Options like `zookeeper.image.registry`, + `zookeeper.global.security.allowInsecureImages`, `zookeeper.tls.*`, and + `zookeeper.auth.*` no longer apply. + +### Migration steps + +> **Important**: ZooKeeper stores Pinot cluster metadata (table configs, schemas, +> segment assignments). Losing this data means the Pinot cluster will need to be +> reconfigured. Plan accordingly. + +#### Option A: Fresh ZooKeeper (simplest, requires Pinot reconfiguration) + +```bash +NAMESPACE=pinot-quickstart +RELEASE=pinot + +# 1. Delete the old ZooKeeper StatefulSet (pods will be terminated) +kubectl delete statefulset ${RELEASE}-zookeeper -n ${NAMESPACE} + +# 2. Delete old ZooKeeper PVCs +kubectl delete pvc -l app.kubernetes.io/name=zookeeper -n ${NAMESPACE} + +# 3. Upgrade the Helm release +helm upgrade ${RELEASE} -n ${NAMESPACE} ./helm/pinot + +# 4. Recreate your Pinot tables and schemas +``` + +#### Option B: Migrate ZooKeeper data (preserves Pinot metadata) + +```bash +NAMESPACE=pinot-quickstart +RELEASE=pinot + +# 1. Export ZooKeeper data while the old cluster is still running +kubectl exec -n ${NAMESPACE} ${RELEASE}-zookeeper-0 -- \ + bash -c 'cd /apache-zookeeper-*/bin && ./zkCli.sh -server localhost:2181 \ + getAll / > /tmp/zk-backup.txt' + +# 2. Delete the old ZooKeeper StatefulSet (keeps PVCs) +kubectl delete statefulset ${RELEASE}-zookeeper -n ${NAMESPACE} --cascade=orphan +kubectl delete pod ${RELEASE}-zookeeper-0 -n ${NAMESPACE} + +# 3. Delete old PVCs (data path is incompatible) +kubectl delete pvc -l app.kubernetes.io/name=zookeeper -n ${NAMESPACE} Review Comment: Option B is labeled as preserving Pinot metadata, but the steps only export ZooKeeper data and then delete the PVCs without any restore/import step. Also, step 2 says it “keeps PVCs” but step 3 deletes them. Either add concrete restore instructions (using the exported backup) or reword this option to reflect that it does not preserve metadata as written. ########## helm/pinot/templates/zookeeper/statefulset.yaml: ########## @@ -0,0 +1,193 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +{{- if .Values.zookeeper.enabled }} +apiVersion: apps/v1 +kind: StatefulSet +metadata: + name: {{ include "pinot.zookeeper.fullname" . }} + namespace: {{ include "pinot.namespace" . }} + labels: + {{- include "pinot.zookeeperLabels" . | nindent 4 }} +spec: + selector: + matchLabels: + {{- include "pinot.zookeeperMatchLabels" . | nindent 6 }} + serviceName: {{ include "pinot.zookeeper.headless" . }} + replicas: {{ .Values.zookeeper.replicaCount }} + updateStrategy: + type: RollingUpdate + podManagementPolicy: Parallel + template: + metadata: + labels: + {{- include "pinot.zookeeperLabels" . | nindent 8 }} + {{- with .Values.zookeeper.podAnnotations }} + annotations: + {{- toYaml . | nindent 8 }} + {{- end }} + spec: + {{- with .Values.imagePullSecrets }} + imagePullSecrets: + {{- toYaml . | nindent 8 }} + {{- end }} + {{- with .Values.zookeeper.nodeSelector }} + nodeSelector: + {{- toYaml . | nindent 8 }} + {{- end }} + {{- with .Values.zookeeper.affinity }} + affinity: + {{- toYaml . | nindent 8 }} + {{- end }} + {{- with .Values.zookeeper.tolerations }} + tolerations: + {{- toYaml . | nindent 8 }} + {{- end }} + {{- with .Values.zookeeper.securityContext }} + securityContext: + {{- toYaml . | nindent 8 }} + {{- end }} + containers: + - name: zookeeper + image: "{{ .Values.zookeeper.image.repository }}:{{ .Values.zookeeper.image.tag }}" + imagePullPolicy: {{ .Values.zookeeper.image.pullPolicy }} + env: + - name: ZOO_MY_ID + valueFrom: + fieldRef: + fieldPath: metadata.labels['apps.kubernetes.io/pod-index'] + - name: ZOO_SERVERS + value: {{ include "pinot.zookeeper.servers" . | quote }} + - name: ZOO_TICK_TIME + value: {{ .Values.zookeeper.tickTime | default 2000 | quote }} + - name: ZOO_INIT_LIMIT + value: {{ .Values.zookeeper.initLimit | default 10 | quote }} + - name: ZOO_SYNC_LIMIT + value: {{ .Values.zookeeper.syncLimit | default 5 | quote }} + - name: ZOO_MAX_CLIENT_CNXNS + value: {{ .Values.zookeeper.maxClientCnxns | default 60 | quote }} + - name: ZOO_AUTOPURGE_PURGE_INTERVAL + value: {{ .Values.zookeeper.autopurge.purgeInterval | default 1 | quote }} + - name: ZOO_AUTOPURGE_SNAP_RETAIN_COUNT + value: {{ .Values.zookeeper.autopurge.snapRetainCount | default 5 | quote }} + - name: JVMFLAGS + value: "-Xmx{{ .Values.zookeeper.heapSize }}m -Xms{{ .Values.zookeeper.heapSize }}m {{ .Values.zookeeper.jvmFlags }}" + - name: ZOO_4LW_COMMANDS_WHITELIST + value: {{ .Values.zookeeper.fourLetterWordsWhitelist | default "srvr,mntr,ruok,conf" | quote }} + {{- with .Values.zookeeper.extraEnv }} + {{- toYaml . | nindent 10 }} + {{- end }} + command: + - bash + - -c + - | + # The official ZooKeeper image uses 1-based IDs, but pod-index is 0-based + HOST=$(hostname -s) + if [[ $HOST =~ (.*)-([0-9]+)$ ]]; then + export ZOO_MY_ID=$(( ${BASH_REMATCH[2]} + 1 )) + else + echo "Failed to extract ordinal from hostname $HOST" + exit 1 + fi + exec /docker-entrypoint.sh zkServer.sh start-foreground + ports: + - name: client + containerPort: {{ .Values.zookeeper.port }} + - name: follower + containerPort: 2888 + - name: election + containerPort: 3888 + {{- if .Values.zookeeper.adminServer.enabled }} + - name: admin + containerPort: {{ .Values.zookeeper.adminServer.port | default 8080 }} + {{- end }} + {{- if .Values.zookeeper.probes.livenessEnabled }} + livenessProbe: + exec: + command: + - bash + - -c + - echo ruok | nc localhost {{ .Values.zookeeper.port }} | grep imok + initialDelaySeconds: {{ .Values.zookeeper.probes.liveness.initialDelaySeconds | default 30 }} + periodSeconds: {{ .Values.zookeeper.probes.liveness.periodSeconds | default 10 }} + timeoutSeconds: {{ .Values.zookeeper.probes.liveness.timeoutSeconds | default 5 }} + failureThreshold: {{ .Values.zookeeper.probes.liveness.failureThreshold | default 6 }} + {{- end }} + {{- if .Values.zookeeper.probes.readinessEnabled }} + readinessProbe: + exec: + command: + - bash + - -c + - echo ruok | nc localhost {{ .Values.zookeeper.port }} | grep imok + initialDelaySeconds: {{ .Values.zookeeper.probes.readiness.initialDelaySeconds | default 10 }} + periodSeconds: {{ .Values.zookeeper.probes.readiness.periodSeconds | default 10 }} + timeoutSeconds: {{ .Values.zookeeper.probes.readiness.timeoutSeconds | default 5 }} + failureThreshold: {{ .Values.zookeeper.probes.readiness.failureThreshold | default 6 }} + {{- end }} + volumeMounts: + - name: data + mountPath: /data + - name: datalog + mountPath: /datalog + {{- with .Values.zookeeper.resources }} + resources: + {{- toYaml . | nindent 10 }} + {{- end }} +{{- if not .Values.zookeeper.persistence.enabled }} + volumes: + - name: data + emptyDir: {} + - name: datalog + emptyDir: {} +{{- end }} +{{- if .Values.zookeeper.persistence.enabled }} + volumeClaimTemplates: + - metadata: + name: data + spec: + accessModes: + - {{ .Values.zookeeper.persistence.accessMode | default "ReadWriteOnce" | quote }} + {{- if .Values.zookeeper.persistence.storageClass }} + {{- if (eq "-" .Values.zookeeper.persistence.storageClass) }} + storageClassName: "" + {{- else }} + storageClassName: {{ .Values.zookeeper.persistence.storageClass }} + {{- end }} + {{- end }} + resources: + requests: + storage: {{ .Values.zookeeper.persistence.data.size | default .Values.zookeeper.persistence.size | quote }} + - metadata: + name: datalog + spec: + accessModes: + - {{ .Values.zookeeper.persistence.accessMode | default "ReadWriteOnce" | quote }} + {{- if .Values.zookeeper.persistence.storageClass }} + {{- if (eq "-" .Values.zookeeper.persistence.storageClass) }} + storageClassName: "" + {{- else }} + storageClassName: {{ .Values.zookeeper.persistence.storageClass }} + {{- end }} + {{- end }} + resources: + requests: + storage: {{ .Values.zookeeper.persistence.datalog.size | default "8Gi" | quote }} Review Comment: In `volumeClaimTemplates` for the `datalog` PVC, the requested storage defaults to a hard-coded "8Gi" rather than following the chart’s `zookeeper.persistence.size` default (as the values.yaml comment indicates). This makes `persistence.size` only apply to the `data` PVC unless users also set `persistence.datalog.size` explicitly. ```suggestion storage: {{ .Values.zookeeper.persistence.datalog.size | default .Values.zookeeper.persistence.size | quote }} ``` ########## helm/pinot/templates/zookeeper/statefulset.yaml: ########## @@ -0,0 +1,193 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +{{- if .Values.zookeeper.enabled }} +apiVersion: apps/v1 +kind: StatefulSet +metadata: + name: {{ include "pinot.zookeeper.fullname" . }} + namespace: {{ include "pinot.namespace" . }} + labels: + {{- include "pinot.zookeeperLabels" . | nindent 4 }} +spec: + selector: + matchLabels: + {{- include "pinot.zookeeperMatchLabels" . | nindent 6 }} + serviceName: {{ include "pinot.zookeeper.headless" . }} + replicas: {{ .Values.zookeeper.replicaCount }} + updateStrategy: + type: RollingUpdate + podManagementPolicy: Parallel + template: + metadata: + labels: + {{- include "pinot.zookeeperLabels" . | nindent 8 }} + {{- with .Values.zookeeper.podAnnotations }} + annotations: + {{- toYaml . | nindent 8 }} + {{- end }} + spec: + {{- with .Values.imagePullSecrets }} + imagePullSecrets: + {{- toYaml . | nindent 8 }} + {{- end }} + {{- with .Values.zookeeper.nodeSelector }} + nodeSelector: + {{- toYaml . | nindent 8 }} + {{- end }} + {{- with .Values.zookeeper.affinity }} + affinity: + {{- toYaml . | nindent 8 }} + {{- end }} + {{- with .Values.zookeeper.tolerations }} + tolerations: + {{- toYaml . | nindent 8 }} + {{- end }} + {{- with .Values.zookeeper.securityContext }} + securityContext: + {{- toYaml . | nindent 8 }} + {{- end }} + containers: + - name: zookeeper + image: "{{ .Values.zookeeper.image.repository }}:{{ .Values.zookeeper.image.tag }}" + imagePullPolicy: {{ .Values.zookeeper.image.pullPolicy }} + env: + - name: ZOO_MY_ID + valueFrom: + fieldRef: + fieldPath: metadata.labels['apps.kubernetes.io/pod-index'] + - name: ZOO_SERVERS + value: {{ include "pinot.zookeeper.servers" . | quote }} + - name: ZOO_TICK_TIME + value: {{ .Values.zookeeper.tickTime | default 2000 | quote }} + - name: ZOO_INIT_LIMIT + value: {{ .Values.zookeeper.initLimit | default 10 | quote }} + - name: ZOO_SYNC_LIMIT + value: {{ .Values.zookeeper.syncLimit | default 5 | quote }} + - name: ZOO_MAX_CLIENT_CNXNS + value: {{ .Values.zookeeper.maxClientCnxns | default 60 | quote }} + - name: ZOO_AUTOPURGE_PURGE_INTERVAL + value: {{ .Values.zookeeper.autopurge.purgeInterval | default 1 | quote }} + - name: ZOO_AUTOPURGE_SNAP_RETAIN_COUNT + value: {{ .Values.zookeeper.autopurge.snapRetainCount | default 5 | quote }} + - name: JVMFLAGS + value: "-Xmx{{ .Values.zookeeper.heapSize }}m -Xms{{ .Values.zookeeper.heapSize }}m {{ .Values.zookeeper.jvmFlags }}" + - name: ZOO_4LW_COMMANDS_WHITELIST + value: {{ .Values.zookeeper.fourLetterWordsWhitelist | default "srvr,mntr,ruok,conf" | quote }} + {{- with .Values.zookeeper.extraEnv }} + {{- toYaml . | nindent 10 }} + {{- end }} + command: + - bash + - -c + - | + # The official ZooKeeper image uses 1-based IDs, but pod-index is 0-based + HOST=$(hostname -s) + if [[ $HOST =~ (.*)-([0-9]+)$ ]]; then + export ZOO_MY_ID=$(( ${BASH_REMATCH[2]} + 1 )) + else + echo "Failed to extract ordinal from hostname $HOST" + exit 1 + fi + exec /docker-entrypoint.sh zkServer.sh start-foreground + ports: + - name: client + containerPort: {{ .Values.zookeeper.port }} + - name: follower + containerPort: 2888 + - name: election + containerPort: 3888 + {{- if .Values.zookeeper.adminServer.enabled }} + - name: admin + containerPort: {{ .Values.zookeeper.adminServer.port | default 8080 }} + {{- end }} Review Comment: `zookeeper.adminServer.enabled` currently only controls whether the admin port is listed under `container.ports`; it doesn’t actually enable/disable or configure the ZooKeeper AdminServer in the container itself. This can be misleading because users may expect toggling this value to affect the server behavior, not just port metadata/service exposure. ########## helm/pinot/templates/_helpers.tpl: ########## @@ -199,6 +199,46 @@ else use user-provided URL {{- end -}} {{- end -}} +{{/* +Zookeeper labels +*/}} +{{- define "pinot.zookeeperLabels" -}} +{{- include "pinot.labels" . }} +component: zookeeper +{{- end }} + +{{/* +Zookeeper Match Selector labels +*/}} +{{- define "pinot.zookeeperMatchLabels" -}} +{{- include "pinot.matchLabels" . }} +component: zookeeper +{{- end }} + +{{/* +The name of the zookeeper headless service. +*/}} +{{- define "pinot.zookeeper.headless" -}} +{{- printf "%s-headless" (include "pinot.zookeeper.fullname" .) | trunc 63 | trimSuffix "-" -}} +{{- end -}} + +{{/* +Generate the ZOO_SERVERS string for the official ZooKeeper Docker image. +Format: server.1=<pod-0>.<headless>:2888:3888;2181 server.2=... +*/}} +{{- define "pinot.zookeeper.servers" -}} +{{- $fullname := (include "pinot.zookeeper.fullname" .) -}} +{{- $headless := (include "pinot.zookeeper.headless" .) -}} +{{- $namespace := (include "pinot.namespace" .) -}} +{{- $port := .Values.zookeeper.port | int -}} +{{- $replicas := .Values.zookeeper.replicaCount | int -}} +{{- $servers := list -}} +{{- range $i := until $replicas -}} +{{- $servers = append $servers (printf "server.%d=%s-%d.%s.%s.svc.cluster.local:2888:3888;%d" (add $i 1) $fullname $i $headless $namespace $port) -}} Review Comment: `pinot.zookeeper.servers` hard-codes the cluster DNS suffix to `.svc.cluster.local`. Kubernetes cluster domains are configurable, so this can break name resolution in clusters that don’t use `cluster.local`. Prefer using the short form (`<pod>.<headless>.<namespace>`) or make the cluster domain configurable via chart values. ```suggestion {{- $servers = append $servers (printf "server.%d=%s-%d.%s.%s:2888:3888;%d" (add $i 1) $fullname $i $headless $namespace $port) -}} ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
