[ 
https://issues.apache.org/jira/browse/FLINK-39791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18084682#comment-18084682
 ] 

Dennis-Mircea Ciupitu commented on FLINK-39791:
-----------------------------------------------

Thanks for the careful write-up [~gerkelznik]. I confirm this is a real 
limitation. I hit the same wall while working on FLINK-39571 and noted it as a 
known limitation in the configuration page.

In my opinion the fix is to make the chart emit exactly one of 
{{flink-conf.yaml}} or {{config.yaml}} into the ConfigMap, chosen by which key 
the user populated under {{{}defaultConfiguration{}}}, and to default to 
{{flink-conf.yaml}} when neither is set so existing installs keep working. That 
keeps the resolution in Helm template logic (rather than touching upstream 
Flink {{GlobalConfiguration}} precedence), and it removes the dual-key 
rendering that makes the current behaviour silently wrong.

Implementing that requires three template changes, mapping to the three causes 
you identified:

 # Consolidate the chart-shipped defaults into the {{conf/}} directory and drop 
the chart-shipped overrides from {{values.yaml}}. Currently the chart ships 
defaults in two places: the framework-level {{conf/flink-conf.yaml}} (slot 
count, Java 17 opts), and operator-specific overrides hardcoded into 
{{values.yaml.defaultConfiguration.flink-conf.yaml}} (SLF4J reporter, reconcile 
interval, observer interval). The hardcoded copy is the one downstream 
consumers can't unset because of Helm's additive merge. After the fix, both 
layers live in the {{conf/}} directory (with {{conf/config.yaml}} carrying the 
nested-format equivalent), and {{values.yaml.defaultConfiguration}} is reduced 
to commented user-override examples that show downstream consumers what shape 
an override would take. After the fix, the {{defaultConfiguration}} block in 
{{values.yaml}} looks like this:
{{{}values.yaml{}}}. Currently the same defaults live in two places: 
{{helm/flink-kubernetes-operator/conf/flink-conf.yaml}} (the bundled file) and 
{{values.yaml.defaultConfiguration.flink-conf.yaml}} (a hardcoded override). 
The hardcoded copy is the one downstream consumers can't unset because of 
Helm's additive merge. After the fix, defaults stay only in the {{conf/}} 
directory, and the {{values.yaml.defaultConfiguration}} blocks are commented 
examples that show downstream consumers what shape an override would take. Also 
add a companion {{helm/flink-kubernetes-operator/conf/config.yaml}} defaults 
file in nested YAML 1.2 format alongside the existing 
{{{}conf/flink-conf.yaml{}}}, so the {{append}} branch in bullet 3 can seed 
format-matching framework defaults. After the fix, the {{defaultConfiguration}} 
block in {{values.yaml}} looks like this:
{code:yaml}
defaultConfiguration: 
  # If set to true, creates ConfigMaps/VolumeMounts. If set to false, no 
configuration will be created.
  # All below fields will be ignored if create is set to false.
  create: true
  # If set to true,
  #      (1) loads the built-in default configuration
  #      (2) appends the below flink-conf and logging configuration overrides
  # If set to false, loads just the overrides as in (2).
  # This option has not effect, if create is equal to false.
  append: true

  # Modern YAML 1.2 format (Flink standard parser), supports nested keys: 
  #  config.yaml: |+
  #    kubernetes.operator: 
  #      metrics.reporter.slf4j: 
  #        factory.class: org.apache.flink.metrics.slf4j.Slf4jReporterFactory
  #        interval: 5 MINUTE
  #      reconcile.interval: 15 s
  #      observer.progress-check.interval: 5 s
  
  # Legacy flat key-value format (Flink legacy parser):
  #  flink-conf.yaml: |+
  #    # Flink Config Overrides
  #    kubernetes.operator.metrics.reporter.slf4j.factory.class: 
org.apache.flink.metrics.slf4j.Slf4jReporterFactory
  #    kubernetes.operator.metrics.reporter.slf4j.interval: 5 MINUTE
  #
  #    kubernetes.operator.reconcile.interval: 15 s
  #    kubernetes.operator.observer.progress-check.interval: 5 s
{code}
 # In {{configmap.yaml}} and {{{}deployment.yaml{}}}, replace {{hasKey}} with 
truthiness checks against the resolved value, so downstream users can opt out 
by setting an empty string (the standard Helm idiom). The pattern is to use 
{{with}} so empty strings are skipped naturally:
{code:java}
# Before:
{{- if hasKey (.Values.defaultConfiguration) "config.yaml" }}
  {{- index (.Values.defaultConfiguration) "config.yaml" | nindent 4 -}}
{{- end }}

# After:
{{- with (index .Values.defaultConfiguration "config.yaml") }}
  {{- . | nindent 4 -}}
{{- end }}
{code}
# In {{configmap.yaml}}, make the two key blocks mutually exclusive based on 
which value resolves to non-empty. When both are set, prefer {{config.yaml}} as 
the more explicit modern choice. When neither is set, fall back to 
{{flink-conf.yaml}} so existing installs keep working.
{code}
# Before (configmap.yaml renders BOTH keys unconditionally):
data:
  config.yaml: |+
    [seeded defaults + user content]
  flink-conf.yaml: |+
    [seeded defaults + user content]

# After (configmap.yaml renders exactly one key based on user intent,
# with the seeded defaults sourced from the matching format file,
# and the watchNamespaces / operatorHealth chart-value injections
# factored into a shared partial in _helpers.tpl):
data:
{{- $configYaml := index .Values.defaultConfiguration "config.yaml" }}
{{- $flinkConf := index .Values.defaultConfiguration "flink-conf.yaml" }}
{{- if $configYaml }}
  config.yaml: |+
{{- if .Values.defaultConfiguration.append }}
    {{- $.Files.Get "conf/config.yaml" | nindent 4 -}}
{{- end }}
    {{- $configYaml | nindent 4 -}}
{{- include "flink-operator.chart-value-injections" . }}
{{- else }}
  flink-conf.yaml: |+
{{- if .Values.defaultConfiguration.append }}
    {{- $.Files.Get "conf/flink-conf.yaml" | nindent 4 -}}
{{- end }}
{{- with $flinkConf }}
    {{- . | nindent 4 -}}
{{- end }}
{{- include "flink-operator.chart-value-injections" . }}
{{- end }}
{code}
The new partial in {{helm/flink-kubernetes-operator/templates/_helpers.tpl}}:
{code}
{{- define "flink-operator.chart-value-injections" -}}
{{- if .Values.watchNamespaces }}
    kubernetes.operator.watched.namespaces: {{ join "," .Values.watchNamespaces 
 }}
{{- end }}
{{- if index .Values "operatorHealth" }}
    kubernetes.operator.health.probe.enabled: true
    kubernetes.operator.health.probe.port: {{ .Values.operatorHealth.port }}
{{- end }}
{{- end -}}
{code}

I had this in mind as a follow-up after FLINK-39571 lands, but you got here 
already with a good analysis. Do you want to tackle the implementation as a 
contribution? If you'd rather not, I'm happy to pick it up.

cc [~ameszaros] [~gyfora] 

> Helm chart silently ignores defaultConfiguration.config.yaml; always mounts 
> flink-conf.yaml from chart defaults
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-39791
>                 URL: https://issues.apache.org/jira/browse/FLINK-39791
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>    Affects Versions: kubernetes-operator-1.14.0, kubernetes-operator-1.15.0
>            Reporter: Gerk Elznik
>            Priority: Major
>              Labels: helm
>
> h2. Summary
> The Helm chart at {{helm/flink-kubernetes-operator/}} exposes a 
> {{defaultConfiguration.config.yaml}} parameter (showcased in {{values.yaml}} 
> per FLINK-38409 / PR #1022), but a downstream Helm consumer who sets it 
> cannot actually get {{config.yaml}} mounted into the operator pod. The chart 
> silently continues to mount {{{}flink-conf.yaml{}}}, populated only with the 
> chart's hardcoded defaults. The user's {{config.yaml}} content lands in the 
> {{flink-operator-config}} ConfigMap under the {{config.yaml}} key but is 
> never read by the operator JVM.
> h2. Root cause
> Two design choices interact:
> *1. {{helm/flink-kubernetes-operator/values.yaml}} hardcodes 
> {{defaultConfiguration.flink-conf.yaml}}* as a non-empty string (Flink-1.x 
> StatsD/SLF4J reporter config):
> {code:yaml}
> defaultConfiguration:   ...
>   flink-conf.yaml: |+
>     # Flink Config Overrides
>     kubernetes.operator.metrics.reporter.slf4j.factory.class: 
> org.apache.flink.metrics.slf4j.Slf4jReporterFactory
>     kubernetes.operator.metrics.reporter.slf4j.interval: 5 MINUTE
>     kubernetes.operator.reconcile.interval: 15 s
>     kubernetes.operator.observer.progress-check.interval: 5 s
> {code}
> *2. {{helm/flink-kubernetes-operator/templates/controller/deployment.yaml}} 
> selects the mounted file with:*
> {code:java}
> {{- if hasKey .Values.defaultConfiguration "flink-conf.yaml" }}
>   - key: flink-conf.yaml
>     path: flink-conf.yaml
> {{- else }}
>   - key: config.yaml
>     path: config.yaml
> {{- end }}
> {code}
> Helm value merging is additive on maps – a downstream -{{{}f 
> my-values.yaml{}}} or {{-set}} cannot remove a key that exists in the chart's 
> own {{{}values.yaml{}}}. Setting {{{}defaultConfiguration.flink-conf.yaml: 
> null{}}}, {{{}~{}}}, or {{""}} in a downstream
> values file still leaves the key present in {{.Values.defaultConfiguration}} 
> for {{hasKey}} purposes. So {{hasKey}} is always true and the {{config.yaml}} 
> branch of the conditional is unreachable for any normal consumer of the 
> published chart.
> h3. Compounding issue
> {{templates/controller/configmap.yaml}} unconditionally emits *both* 
> {{config.yaml}} and {{flink-conf.yaml}} keys into the rendered ConfigMap. So 
> even if a user works around the mount selection, the ConfigMap contains two 
> divergent files and the
> wrong one is authoritative based on what the Deployment volume's {{items}} 
> references.
> h3. Misleading showcase
> FLINK-38409 / PR #1022 added this comment to {{{}values.yaml{}}}:
> {quote} # Uncomment to use the new config.yaml format, but also comment out 
> the flink-config.yaml key.{quote}
> The instruction "comment out the flink-conf.yaml key" silently requires 
> editing the *chart's* own {{{}values.yaml{}}}, which downstream Helm 
> consumers cannot do without forking. The PR landed an undocumented workaround 
> as a documented feature.
> h2. Reproduction
>  # Install or {{helm template}} the published 1.15.0 chart with downstream 
> values that set {{defaultConfiguration.config.yaml}} to some Flink-2.x 
> structured-YAML content. Do *not* set 
> {{{}defaultConfiguration.flink-conf.yaml{}}}.
>  # Inspect the rendered Deployment: the 
> {{{}flink-operator-config-volume{}}}'s {{items}} references 
> {{{}flink-conf.yaml{}}}, not {{{}config.yaml{}}}.
>  # Inspect the rendered ConfigMap {{{}flink-operator-config{}}}: both 
> {{data.config.yaml}} and {{data.flink-conf.yaml}} are present. The user's 
> content is under {{{}config.yaml{}}}; the upstream defaults are under 
> {{{}flink-conf.yaml{}}}.
>  # The pod reads {{/opt/flink/conf/flink-conf.yaml}} and runs on upstream 
> defaults; the user's overrides are silently inert.
> h2. Expected behavior
> A user who provides {{defaultConfiguration.config.yaml}} and nothing for 
> {{defaultConfiguration.flink-conf.yaml}} should get:
>  * A Deployment that mounts {{{}config.yaml{}}}.
>  * A ConfigMap that does not include a stale, ignored {{flink-conf.yaml}} key 
> (or, if it must for back-compat, an explicitly empty one).
> h2. Suggested fixes (any one of)
>  # *Invert priority* in {{templates/controller/deployment.yaml}} – prefer 
> {{config.yaml}} when the user has explicitly populated it, fall back to 
> {{flink-conf.yaml}} only otherwise. Lossless for legacy users; gives 
> precedence to the explicit
> modern-format opt-in. Mirror the same priority in 
> {{templates/controller/configmap.yaml}} so only one key is emitted.
>  # *Drop the hardcoded default* for {{defaultConfiguration.flink-conf.yaml}} 
> in {{{}values.yaml{}}}. The same content is loaded from 
> {{conf/flink-conf.yaml}} via {{{}defaultConfiguration.append: true{}}}, so 
> nothing is lost; {{hasKey}} then actually
> discriminates based on user intent.
>  # *Treat empty values as absent* – replace {{hasKey ...}} with a check on 
> the resolved value (e.g. {{{}(index .Values.defaultConfiguration 
> "flink-conf.yaml"){}}}). Less ergonomic than #1 but the smallest template 
> change; lets downstream users
> effectively unset via {{{}flink-conf.yaml: ""{}}}.
> h2. Workaround
> Apply Kustomize patches downstream after {{{}helm template{}}}:
>  # {{op: remove}} the {{/data/flink-conf.yaml}} key from the rendered 
> ConfigMap.
>  # Strategic-merge the Deployment volume's {{items}} to reference 
> {{config.yaml}} plus the log4j (or logback) files.
> h2. Related tickets
>  * FLINK-35744 ("Support flink 1.19 config.yaml format", fixed in 1.12.0) – 
> added {{config.yaml}} support generally; did not address downstream override.
>  * FLINK-38409 ("Showcase in Helm chart values.yaml that config.yaml can be 
> used", merged via PR #1022) – added the misleading showcase comment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to