[
https://issues.apache.org/jira/browse/FLINK-39791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18084682#comment-18084682
]
Dennis-Mircea Ciupitu commented on FLINK-39791:
-----------------------------------------------
Thanks for the careful write-up [~gerkelznik]. I confirm this is a real
limitation. I hit the same wall while working on FLINK-39571 and noted it as a
known limitation in the configuration page.
In my opinion the fix is to make the chart emit exactly one of
{{flink-conf.yaml}} or {{config.yaml}} into the ConfigMap, chosen by which key
the user populated under {{{}defaultConfiguration{}}}, and to default to
{{flink-conf.yaml}} when neither is set so existing installs keep working. That
keeps the resolution in Helm template logic (rather than touching upstream
Flink {{GlobalConfiguration}} precedence), and it removes the dual-key
rendering that makes the current behaviour silently wrong.
Implementing that requires three template changes, mapping to the three causes
you identified:
# Consolidate the chart-shipped defaults into the {{conf/}} directory and drop
the chart-shipped overrides from {{values.yaml}}. Currently the chart ships
defaults in two places: the framework-level {{conf/flink-conf.yaml}} (slot
count, Java 17 opts), and operator-specific overrides hardcoded into
{{values.yaml.defaultConfiguration.flink-conf.yaml}} (SLF4J reporter, reconcile
interval, observer interval). The hardcoded copy is the one downstream
consumers can't unset because of Helm's additive merge. After the fix, both
layers live in the {{conf/}} directory (with {{conf/config.yaml}} carrying the
nested-format equivalent), and {{values.yaml.defaultConfiguration}} is reduced
to commented user-override examples that show downstream consumers what shape
an override would take. After the fix, the {{defaultConfiguration}} block in
{{values.yaml}} looks like this:
{{{}values.yaml{}}}. Currently the same defaults live in two places:
{{helm/flink-kubernetes-operator/conf/flink-conf.yaml}} (the bundled file) and
{{values.yaml.defaultConfiguration.flink-conf.yaml}} (a hardcoded override).
The hardcoded copy is the one downstream consumers can't unset because of
Helm's additive merge. After the fix, defaults stay only in the {{conf/}}
directory, and the {{values.yaml.defaultConfiguration}} blocks are commented
examples that show downstream consumers what shape an override would take. Also
add a companion {{helm/flink-kubernetes-operator/conf/config.yaml}} defaults
file in nested YAML 1.2 format alongside the existing
{{{}conf/flink-conf.yaml{}}}, so the {{append}} branch in bullet 3 can seed
format-matching framework defaults. After the fix, the {{defaultConfiguration}}
block in {{values.yaml}} looks like this:
{code:yaml}
defaultConfiguration:
# If set to true, creates ConfigMaps/VolumeMounts. If set to false, no
configuration will be created.
# All below fields will be ignored if create is set to false.
create: true
# If set to true,
# (1) loads the built-in default configuration
# (2) appends the below flink-conf and logging configuration overrides
# If set to false, loads just the overrides as in (2).
# This option has not effect, if create is equal to false.
append: true
# Modern YAML 1.2 format (Flink standard parser), supports nested keys:
# config.yaml: |+
# kubernetes.operator:
# metrics.reporter.slf4j:
# factory.class: org.apache.flink.metrics.slf4j.Slf4jReporterFactory
# interval: 5 MINUTE
# reconcile.interval: 15 s
# observer.progress-check.interval: 5 s
# Legacy flat key-value format (Flink legacy parser):
# flink-conf.yaml: |+
# # Flink Config Overrides
# kubernetes.operator.metrics.reporter.slf4j.factory.class:
org.apache.flink.metrics.slf4j.Slf4jReporterFactory
# kubernetes.operator.metrics.reporter.slf4j.interval: 5 MINUTE
#
# kubernetes.operator.reconcile.interval: 15 s
# kubernetes.operator.observer.progress-check.interval: 5 s
{code}
# In {{configmap.yaml}} and {{{}deployment.yaml{}}}, replace {{hasKey}} with
truthiness checks against the resolved value, so downstream users can opt out
by setting an empty string (the standard Helm idiom). The pattern is to use
{{with}} so empty strings are skipped naturally:
{code:java}
# Before:
{{- if hasKey (.Values.defaultConfiguration) "config.yaml" }}
{{- index (.Values.defaultConfiguration) "config.yaml" | nindent 4 -}}
{{- end }}
# After:
{{- with (index .Values.defaultConfiguration "config.yaml") }}
{{- . | nindent 4 -}}
{{- end }}
{code}
# In {{configmap.yaml}}, make the two key blocks mutually exclusive based on
which value resolves to non-empty. When both are set, prefer {{config.yaml}} as
the more explicit modern choice. When neither is set, fall back to
{{flink-conf.yaml}} so existing installs keep working.
{code}
# Before (configmap.yaml renders BOTH keys unconditionally):
data:
config.yaml: |+
[seeded defaults + user content]
flink-conf.yaml: |+
[seeded defaults + user content]
# After (configmap.yaml renders exactly one key based on user intent,
# with the seeded defaults sourced from the matching format file,
# and the watchNamespaces / operatorHealth chart-value injections
# factored into a shared partial in _helpers.tpl):
data:
{{- $configYaml := index .Values.defaultConfiguration "config.yaml" }}
{{- $flinkConf := index .Values.defaultConfiguration "flink-conf.yaml" }}
{{- if $configYaml }}
config.yaml: |+
{{- if .Values.defaultConfiguration.append }}
{{- $.Files.Get "conf/config.yaml" | nindent 4 -}}
{{- end }}
{{- $configYaml | nindent 4 -}}
{{- include "flink-operator.chart-value-injections" . }}
{{- else }}
flink-conf.yaml: |+
{{- if .Values.defaultConfiguration.append }}
{{- $.Files.Get "conf/flink-conf.yaml" | nindent 4 -}}
{{- end }}
{{- with $flinkConf }}
{{- . | nindent 4 -}}
{{- end }}
{{- include "flink-operator.chart-value-injections" . }}
{{- end }}
{code}
The new partial in {{helm/flink-kubernetes-operator/templates/_helpers.tpl}}:
{code}
{{- define "flink-operator.chart-value-injections" -}}
{{- if .Values.watchNamespaces }}
kubernetes.operator.watched.namespaces: {{ join "," .Values.watchNamespaces
}}
{{- end }}
{{- if index .Values "operatorHealth" }}
kubernetes.operator.health.probe.enabled: true
kubernetes.operator.health.probe.port: {{ .Values.operatorHealth.port }}
{{- end }}
{{- end -}}
{code}
I had this in mind as a follow-up after FLINK-39571 lands, but you got here
already with a good analysis. Do you want to tackle the implementation as a
contribution? If you'd rather not, I'm happy to pick it up.
cc [~ameszaros] [~gyfora]
> Helm chart silently ignores defaultConfiguration.config.yaml; always mounts
> flink-conf.yaml from chart defaults
> ---------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-39791
> URL: https://issues.apache.org/jira/browse/FLINK-39791
> Project: Flink
> Issue Type: Bug
> Components: Kubernetes Operator
> Affects Versions: kubernetes-operator-1.14.0, kubernetes-operator-1.15.0
> Reporter: Gerk Elznik
> Priority: Major
> Labels: helm
>
> h2. Summary
> The Helm chart at {{helm/flink-kubernetes-operator/}} exposes a
> {{defaultConfiguration.config.yaml}} parameter (showcased in {{values.yaml}}
> per FLINK-38409 / PR #1022), but a downstream Helm consumer who sets it
> cannot actually get {{config.yaml}} mounted into the operator pod. The chart
> silently continues to mount {{{}flink-conf.yaml{}}}, populated only with the
> chart's hardcoded defaults. The user's {{config.yaml}} content lands in the
> {{flink-operator-config}} ConfigMap under the {{config.yaml}} key but is
> never read by the operator JVM.
> h2. Root cause
> Two design choices interact:
> *1. {{helm/flink-kubernetes-operator/values.yaml}} hardcodes
> {{defaultConfiguration.flink-conf.yaml}}* as a non-empty string (Flink-1.x
> StatsD/SLF4J reporter config):
> {code:yaml}
> defaultConfiguration: ...
> flink-conf.yaml: |+
> # Flink Config Overrides
> kubernetes.operator.metrics.reporter.slf4j.factory.class:
> org.apache.flink.metrics.slf4j.Slf4jReporterFactory
> kubernetes.operator.metrics.reporter.slf4j.interval: 5 MINUTE
> kubernetes.operator.reconcile.interval: 15 s
> kubernetes.operator.observer.progress-check.interval: 5 s
> {code}
> *2. {{helm/flink-kubernetes-operator/templates/controller/deployment.yaml}}
> selects the mounted file with:*
> {code:java}
> {{- if hasKey .Values.defaultConfiguration "flink-conf.yaml" }}
> - key: flink-conf.yaml
> path: flink-conf.yaml
> {{- else }}
> - key: config.yaml
> path: config.yaml
> {{- end }}
> {code}
> Helm value merging is additive on maps – a downstream -{{{}f
> my-values.yaml{}}} or {{-set}} cannot remove a key that exists in the chart's
> own {{{}values.yaml{}}}. Setting {{{}defaultConfiguration.flink-conf.yaml:
> null{}}}, {{{}~{}}}, or {{""}} in a downstream
> values file still leaves the key present in {{.Values.defaultConfiguration}}
> for {{hasKey}} purposes. So {{hasKey}} is always true and the {{config.yaml}}
> branch of the conditional is unreachable for any normal consumer of the
> published chart.
> h3. Compounding issue
> {{templates/controller/configmap.yaml}} unconditionally emits *both*
> {{config.yaml}} and {{flink-conf.yaml}} keys into the rendered ConfigMap. So
> even if a user works around the mount selection, the ConfigMap contains two
> divergent files and the
> wrong one is authoritative based on what the Deployment volume's {{items}}
> references.
> h3. Misleading showcase
> FLINK-38409 / PR #1022 added this comment to {{{}values.yaml{}}}:
> {quote} # Uncomment to use the new config.yaml format, but also comment out
> the flink-config.yaml key.{quote}
> The instruction "comment out the flink-conf.yaml key" silently requires
> editing the *chart's* own {{{}values.yaml{}}}, which downstream Helm
> consumers cannot do without forking. The PR landed an undocumented workaround
> as a documented feature.
> h2. Reproduction
> # Install or {{helm template}} the published 1.15.0 chart with downstream
> values that set {{defaultConfiguration.config.yaml}} to some Flink-2.x
> structured-YAML content. Do *not* set
> {{{}defaultConfiguration.flink-conf.yaml{}}}.
> # Inspect the rendered Deployment: the
> {{{}flink-operator-config-volume{}}}'s {{items}} references
> {{{}flink-conf.yaml{}}}, not {{{}config.yaml{}}}.
> # Inspect the rendered ConfigMap {{{}flink-operator-config{}}}: both
> {{data.config.yaml}} and {{data.flink-conf.yaml}} are present. The user's
> content is under {{{}config.yaml{}}}; the upstream defaults are under
> {{{}flink-conf.yaml{}}}.
> # The pod reads {{/opt/flink/conf/flink-conf.yaml}} and runs on upstream
> defaults; the user's overrides are silently inert.
> h2. Expected behavior
> A user who provides {{defaultConfiguration.config.yaml}} and nothing for
> {{defaultConfiguration.flink-conf.yaml}} should get:
> * A Deployment that mounts {{{}config.yaml{}}}.
> * A ConfigMap that does not include a stale, ignored {{flink-conf.yaml}} key
> (or, if it must for back-compat, an explicitly empty one).
> h2. Suggested fixes (any one of)
> # *Invert priority* in {{templates/controller/deployment.yaml}} – prefer
> {{config.yaml}} when the user has explicitly populated it, fall back to
> {{flink-conf.yaml}} only otherwise. Lossless for legacy users; gives
> precedence to the explicit
> modern-format opt-in. Mirror the same priority in
> {{templates/controller/configmap.yaml}} so only one key is emitted.
> # *Drop the hardcoded default* for {{defaultConfiguration.flink-conf.yaml}}
> in {{{}values.yaml{}}}. The same content is loaded from
> {{conf/flink-conf.yaml}} via {{{}defaultConfiguration.append: true{}}}, so
> nothing is lost; {{hasKey}} then actually
> discriminates based on user intent.
> # *Treat empty values as absent* – replace {{hasKey ...}} with a check on
> the resolved value (e.g. {{{}(index .Values.defaultConfiguration
> "flink-conf.yaml"){}}}). Less ergonomic than #1 but the smallest template
> change; lets downstream users
> effectively unset via {{{}flink-conf.yaml: ""{}}}.
> h2. Workaround
> Apply Kustomize patches downstream after {{{}helm template{}}}:
> # {{op: remove}} the {{/data/flink-conf.yaml}} key from the rendered
> ConfigMap.
> # Strategic-merge the Deployment volume's {{items}} to reference
> {{config.yaml}} plus the log4j (or logback) files.
> h2. Related tickets
> * FLINK-35744 ("Support flink 1.19 config.yaml format", fixed in 1.12.0) –
> added {{config.yaml}} support generally; did not address downstream override.
> * FLINK-38409 ("Showcase in Helm chart values.yaml that config.yaml can be
> used", merged via PR #1022) – added the misleading showcase comment.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)