[
https://issues.apache.org/jira/browse/FLINK-39805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pavel Zeger updated FLINK-39805:
--------------------------------
Description:
`flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/FlinkConfigBuilder.java`,
three calls:
{code:java}
File log4jConfFile = new File(tmpDir.getAbsolutePath(), CONFIG_FILE_LOG4J_NAME);
Files.write(log4jConfFile.toPath(), log4jConf.getBytes());
File logbackConfFile = new File(tmpDir.getAbsolutePath(),
CONFIG_FILE_LOGBACK_NAME);
Files.write(logbackConfFile.toPath(), logbackConf.getBytes());
final File tmp = File.createTempFile(GENERATED_FILE_PREFIX + "podTemplate_",
".yaml");
Files.write(tmp.toPath(), Serialization.asYaml(podTemplate).getBytes());{code}
`String.getBytes()` (no-arg) encodes using the JVM’s Charset.defaultCharset(),
which is environment-dependent. On most modern Linux containers it happens to
be UTF-8, but:
# On older Linux base images and on container runtimes that don’t set
LANG=*UTF-8, the default falls back to US-ASCII or ISO-8859-1.
# On Windows hosts the default is typically windows-1252 or another local code
page.
# In a JVM run with -Dfile.encoding=, the result depends on whatever the
operator was started with.
When this happens, any non-ASCII character in the user’s log4j.properties,
logback.xml, or podTemplate.yaml (a UTF-8 emoji in a comment, an
internationalised label key, an annotation containing a CJK character,
non-breaking spaces in YAML, etc.) is corrupted.
The pod template case is the most concerning. Users frequently add annotations
/ labels / env values containing non-ASCII characters (legitimate use cases:
internationalised tenant labels, owner names with diacritics, region tags,
etc.). A corrupted YAML written to the temp file is then passed to Kubernetes,
which either rejects it (best case) or silently accepts a corrupted value
(worst case).
*Proposed fix*
# Always use UTF-8 explicitly
# Adding the SpotBugs DM_DEFAULT_ENCODING rule to the project would prevent
recurrence.
was:
`flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/FlinkConfigBuilder.java`,
three calls:
{code:java}
File log4jConfFile = new File(tmpDir.getAbsolutePath(), CONFIG_FILE_LOG4J_NAME);
Files.write(log4jConfFile.toPath(), log4jConf.getBytes());
File logbackConfFile = new File(tmpDir.getAbsolutePath(),
CONFIG_FILE_LOGBACK_NAME);
Files.write(logbackConfFile.toPath(), logbackConf.getBytes());
final File tmp = File.createTempFile(GENERATED_FILE_PREFIX + "podTemplate_",
".yaml");
Files.write(tmp.toPath(), Serialization.asYaml(podTemplate).getBytes());{code}
`String.getBytes()` (no-arg) encodes using the JVM’s Charset.defaultCharset(),
which is environment-dependent. On most modern Linux containers it happens to
be UTF-8, but:
# On older Linux base images and on container runtimes that don’t set
LANG=*UTF-8, the default falls back to US-ASCII or ISO-8859-1.
# On Windows hosts the default is typically windows-1252 or another local code
page.
# In a JVM run with -Dfile.encoding=..., the result depends on whatever the
operator was started with.
When this happens, any non-ASCII character in the user’s log4j.properties,
logback.xml, or podTemplate.yaml (a UTF-8 emoji in a comment, an
internationalised label key, an annotation containing a CJK character,
non-breaking spaces in YAML, etc.) is corrupted.
The pod template case is the most concerning. Users frequently add annotations
/ labels / env values containing non-ASCII characters (legitimate use cases:
internationalised tenant labels, owner names with diacritics, region tags,
etc.). A corrupted YAML written to the temp file is then passed to Kubernetes,
which either rejects it (best case) or silently accepts a corrupted value
(worst case).
*Proposed fix*
# Always use UTF-8 explicitly
# Adding the SpotBugs DM_DEFAULT_ENCODING rule to the project would prevent
recurrence.
> FlinkConfigBuilder uses platform-default charset when writing
> log/pod-template files
> ------------------------------------------------------------------------------------
>
> Key: FLINK-39805
> URL: https://issues.apache.org/jira/browse/FLINK-39805
> Project: Flink
> Issue Type: Bug
> Components: Kubernetes Operator
> Reporter: Pavel Zeger
> Priority: Major
>
> `flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/FlinkConfigBuilder.java`,
> three calls:
> {code:java}
> File log4jConfFile = new File(tmpDir.getAbsolutePath(),
> CONFIG_FILE_LOG4J_NAME);
> Files.write(log4jConfFile.toPath(), log4jConf.getBytes());
> File logbackConfFile = new File(tmpDir.getAbsolutePath(),
> CONFIG_FILE_LOGBACK_NAME);
> Files.write(logbackConfFile.toPath(), logbackConf.getBytes());
> final File tmp = File.createTempFile(GENERATED_FILE_PREFIX + "podTemplate_",
> ".yaml");
> Files.write(tmp.toPath(), Serialization.asYaml(podTemplate).getBytes());{code}
> `String.getBytes()` (no-arg) encodes using the JVM’s
> Charset.defaultCharset(), which is environment-dependent. On most modern
> Linux containers it happens to be UTF-8, but:
> # On older Linux base images and on container runtimes that don’t set
> LANG=*UTF-8, the default falls back to US-ASCII or ISO-8859-1.
> # On Windows hosts the default is typically windows-1252 or another local
> code page.
> # In a JVM run with -Dfile.encoding=, the result depends on whatever the
> operator was started with.
> When this happens, any non-ASCII character in the user’s log4j.properties,
> logback.xml, or podTemplate.yaml (a UTF-8 emoji in a comment, an
> internationalised label key, an annotation containing a CJK character,
> non-breaking spaces in YAML, etc.) is corrupted.
> The pod template case is the most concerning. Users frequently add
> annotations / labels / env values containing non-ASCII characters (legitimate
> use cases: internationalised tenant labels, owner names with diacritics,
> region tags, etc.). A corrupted YAML written to the temp file is then passed
> to Kubernetes, which either rejects it (best case) or silently accepts a
> corrupted value (worst case).
>
> *Proposed fix*
> # Always use UTF-8 explicitly
> # Adding the SpotBugs DM_DEFAULT_ENCODING rule to the project would prevent
> recurrence.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)