[
https://issues.apache.org/jira/browse/FLINK-39791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18085126#comment-18085126
]
Dennis-Mircea Ciupitu commented on FLINK-39791:
-----------------------------------------------
All tweaks you proposed are good. The Flink 2.x point is the load-bearing one,
writing on disk as {{config.yaml}} dodges a regression my proposal would have
created on the eventual operator-runtime rebase onto +2.x.
Please proceed with the implementation for this. Thanks!
> Helm chart silently ignores defaultConfiguration.config.yaml; always mounts
> flink-conf.yaml from chart defaults
> ---------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-39791
> URL: https://issues.apache.org/jira/browse/FLINK-39791
> Project: Flink
> Issue Type: Bug
> Components: Kubernetes Operator
> Affects Versions: kubernetes-operator-1.14.0, kubernetes-operator-1.15.0
> Reporter: Gerk Elznik
> Priority: Major
> Labels: helm
>
> h2. Summary
> The Helm chart at {{helm/flink-kubernetes-operator/}} exposes a
> {{defaultConfiguration.config.yaml}} parameter (showcased in {{values.yaml}}
> per FLINK-38409 / PR #1022), but a downstream Helm consumer who sets it
> cannot actually get {{config.yaml}} mounted into the operator pod. The chart
> silently continues to mount {{{}flink-conf.yaml{}}}, populated only with the
> chart's hardcoded defaults. The user's {{config.yaml}} content lands in the
> {{flink-operator-config}} ConfigMap under the {{config.yaml}} key but is
> never read by the operator JVM.
> h2. Root cause
> Two design choices interact:
> *1. {{helm/flink-kubernetes-operator/values.yaml}} hardcodes
> {{defaultConfiguration.flink-conf.yaml}}* as a non-empty string (Flink-1.x
> StatsD/SLF4J reporter config):
> {code:yaml}
> defaultConfiguration: ...
> flink-conf.yaml: |+
> # Flink Config Overrides
> kubernetes.operator.metrics.reporter.slf4j.factory.class:
> org.apache.flink.metrics.slf4j.Slf4jReporterFactory
> kubernetes.operator.metrics.reporter.slf4j.interval: 5 MINUTE
> kubernetes.operator.reconcile.interval: 15 s
> kubernetes.operator.observer.progress-check.interval: 5 s
> {code}
> *2. {{helm/flink-kubernetes-operator/templates/controller/deployment.yaml}}
> selects the mounted file with:*
> {code:java}
> {{- if hasKey .Values.defaultConfiguration "flink-conf.yaml" }}
> - key: flink-conf.yaml
> path: flink-conf.yaml
> {{- else }}
> - key: config.yaml
> path: config.yaml
> {{- end }}
> {code}
> Helm value merging is additive on maps – a downstream -{{{}f
> my-values.yaml{}}} or {{-set}} cannot remove a key that exists in the chart's
> own {{{}values.yaml{}}}. Setting {{{}defaultConfiguration.flink-conf.yaml:
> null{}}}, {{{}~{}}}, or {{""}} in a downstream
> values file still leaves the key present in {{.Values.defaultConfiguration}}
> for {{hasKey}} purposes. So {{hasKey}} is always true and the {{config.yaml}}
> branch of the conditional is unreachable for any normal consumer of the
> published chart.
> h3. Compounding issue
> {{templates/controller/configmap.yaml}} unconditionally emits *both*
> {{config.yaml}} and {{flink-conf.yaml}} keys into the rendered ConfigMap. So
> even if a user works around the mount selection, the ConfigMap contains two
> divergent files and the
> wrong one is authoritative based on what the Deployment volume's {{items}}
> references.
> h3. Misleading showcase
> FLINK-38409 / PR #1022 added this comment to {{{}values.yaml{}}}:
> {quote} # Uncomment to use the new config.yaml format, but also comment out
> the flink-config.yaml key.{quote}
> The instruction "comment out the flink-conf.yaml key" silently requires
> editing the *chart's* own {{{}values.yaml{}}}, which downstream Helm
> consumers cannot do without forking. The PR landed an undocumented workaround
> as a documented feature.
> h2. Reproduction
> # Install or {{helm template}} the published 1.15.0 chart with downstream
> values that set {{defaultConfiguration.config.yaml}} to some Flink-2.x
> structured-YAML content. Do *not* set
> {{{}defaultConfiguration.flink-conf.yaml{}}}.
> # Inspect the rendered Deployment: the
> {{{}flink-operator-config-volume{}}}'s {{items}} references
> {{{}flink-conf.yaml{}}}, not {{{}config.yaml{}}}.
> # Inspect the rendered ConfigMap {{{}flink-operator-config{}}}: both
> {{data.config.yaml}} and {{data.flink-conf.yaml}} are present. The user's
> content is under {{{}config.yaml{}}}; the upstream defaults are under
> {{{}flink-conf.yaml{}}}.
> # The pod reads {{/opt/flink/conf/flink-conf.yaml}} and runs on upstream
> defaults; the user's overrides are silently inert.
> h2. Expected behavior
> A user who provides {{defaultConfiguration.config.yaml}} and nothing for
> {{defaultConfiguration.flink-conf.yaml}} should get:
> * A Deployment that mounts {{{}config.yaml{}}}.
> * A ConfigMap that does not include a stale, ignored {{flink-conf.yaml}} key
> (or, if it must for back-compat, an explicitly empty one).
> h2. Suggested fixes (any one of)
> # *Invert priority* in {{templates/controller/deployment.yaml}} – prefer
> {{config.yaml}} when the user has explicitly populated it, fall back to
> {{flink-conf.yaml}} only otherwise. Lossless for legacy users; gives
> precedence to the explicit
> modern-format opt-in. Mirror the same priority in
> {{templates/controller/configmap.yaml}} so only one key is emitted.
> # *Drop the hardcoded default* for {{defaultConfiguration.flink-conf.yaml}}
> in {{{}values.yaml{}}}. The same content is loaded from
> {{conf/flink-conf.yaml}} via {{{}defaultConfiguration.append: true{}}}, so
> nothing is lost; {{hasKey}} then actually
> discriminates based on user intent.
> # *Treat empty values as absent* – replace {{hasKey ...}} with a check on
> the resolved value (e.g. {{{}(index .Values.defaultConfiguration
> "flink-conf.yaml"){}}}). Less ergonomic than #1 but the smallest template
> change; lets downstream users
> effectively unset via {{{}flink-conf.yaml: ""{}}}.
> h2. Workaround
> Apply Kustomize patches downstream after {{{}helm template{}}}:
> # {{op: remove}} the {{/data/flink-conf.yaml}} key from the rendered
> ConfigMap.
> # Strategic-merge the Deployment volume's {{items}} to reference
> {{config.yaml}} plus the log4j (or logback) files.
> h2. Related tickets
> * FLINK-35744 ("Support flink 1.19 config.yaml format", fixed in 1.12.0) –
> added {{config.yaml}} support generally; did not address downstream override.
> * FLINK-38409 ("Showcase in Helm chart values.yaml that config.yaml can be
> used", merged via PR #1022) – added the misleading showcase comment.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)