[ 
https://issues.apache.org/jira/browse/FLINK-39805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18085013#comment-18085013
 ] 

Pavel Zeger commented on FLINK-39805:
-------------------------------------

Hi [~dciupitu], thanks for the review - you're right that a stock install isn't 
affected. I didn't pay enough attention to the Dockerfile when I've opened 
this. 
On the Bug + Major classification - that was just the default when I created 
the ticket. I didn't set it as I remember.

I don't think you actually need to fork the image to end up with a non UTF-8 
charset. The Helm chart already gives you jvmArgs.operator and operatorPod.env 
([https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/operations/helm/]),
 and whatever you put there ends up on the JVM directly - the entrypoint drops 
$JVM_ARGS into the java command, and the JVM also picks up JDK_JAVA_OPTIONS and 
LANG on its own. So just by using the normal chart settings, someone can pass 
something like -Dfile.encoding=US-ASCII or set a non UTF-8 LANG, and after that 
any non-ASCII text in the log or pod-template files gets garbled. So it can be 
a real problem that's just sitting there waiting to happen, not something only 
a custom image would run into.

I agree it's not a Major bug. Happy to reclassify it as an Improvement / 
hardening, or to close it - whatever you think keeps triage cleanest.

> FlinkConfigBuilder uses platform-default charset when writing 
> log/pod-template files
> ------------------------------------------------------------------------------------
>
>                 Key: FLINK-39805
>                 URL: https://issues.apache.org/jira/browse/FLINK-39805
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>            Reporter: Pavel Zeger
>            Priority: Major
>
> `flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/FlinkConfigBuilder.java`,
>  three calls:
> {code:java}
> File log4jConfFile = new File(tmpDir.getAbsolutePath(), 
> CONFIG_FILE_LOG4J_NAME);
> Files.write(log4jConfFile.toPath(), log4jConf.getBytes());
> File logbackConfFile = new File(tmpDir.getAbsolutePath(), 
> CONFIG_FILE_LOGBACK_NAME);
> Files.write(logbackConfFile.toPath(), logbackConf.getBytes());
> final File tmp = File.createTempFile(GENERATED_FILE_PREFIX + "podTemplate_", 
> ".yaml");
> Files.write(tmp.toPath(), Serialization.asYaml(podTemplate).getBytes());{code}
> `String.getBytes()` (no-arg) encodes using the JVM’s 
> Charset.defaultCharset(), which is environment-dependent. On most modern 
> Linux containers it happens to be UTF-8, but:
>  # On older Linux base images and on container runtimes that don’t set 
> LANG=*UTF-8, the default falls back to US-ASCII or ISO-8859-1.
>  # On Windows hosts the default is typically windows-1252 or another local 
> code page.
>  # In a JVM run with -Dfile.encoding=, the result depends on whatever the 
> operator was started with.
> When this happens, any non-ASCII character in the user’s log4j.properties, 
> logback.xml, or podTemplate.yaml (a UTF-8 emoji in a comment, an 
> internationalised label key, an annotation containing a CJK character, 
> non-breaking spaces in YAML, etc.) is corrupted.
> The pod template case is the most concerning. Users frequently add 
> annotations / labels / env values containing non-ASCII characters (legitimate 
> use cases: internationalised tenant labels, owner names with diacritics, 
> region tags, etc.). A corrupted YAML written to the temp file is then passed 
> to Kubernetes, which either rejects it (best case) or silently accepts a 
> corrupted value (worst case).
>  
> *Proposed fix*
>  # Always use UTF-8 explicitly
>  # Adding the SpotBugs DM_DEFAULT_ENCODING rule to the project would prevent 
> recurrence. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to