[ https://issues.apache.org/jira/browse/FLINK-38035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18010397#comment-18010397 ]
Niha commented on FLINK-38035: ------------------------------ [~rmetzger] Thank you very much for helping me with this.Would it be possible to have this fix backported to the 1.20.x branch as well? If so, could you guide me on the right process? Happy to help with the backport if needed. Thanks again. > Security Vulnerability in PyFlink Logging Mechanism (PythonEnvUtils.java) > ------------------------------------------------------------------------- > > Key: FLINK-38035 > URL: https://issues.apache.org/jira/browse/FLINK-38035 > Project: Flink > Issue Type: Bug > Components: API / Python > Affects Versions: 1.19.1, 1.20.1 > Reporter: Niha > Assignee: Niha > Priority: Major > Labels: pull-request-available > Fix For: 2.2.0 > > > Potential security vulnerability in the logging statement within > {{PythonEnvUtils.java}} that may expose environment variables — including > Kubernetes-mounted secrets — during PyFlink job submission. > The class > [{{org.apache.flink.client.python.PythonEnvUtils}}|https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonEnvUtils.java#L372-L377] > logs all environment variables at job startup with the following line: > > {{{}LOG.info("Starting Python process with environment variables: {}", > environment);{}}}{{{{}}{}}} > This line is problematic because it indiscriminately logs {*}all environment > variables{*}, which may contain {*}sensitive credentials{*}. > h4. *Context: Kubernetes Operator Users Are Especially at Risk* > When Flink is deployed using the {*}Flink Kubernetes Operator{*}, secrets are > commonly passed into pods as *environment variables* (via Kubernetes {{env}} > or {{envFrom}} fields, e.g. from {{{}secretRef{}}}). > This includes: > * Database credentials > * Cloud service keys (e.g., {{{}AWS_SECRET_ACCESS_KEY{}}}) > * Tokens and encryption keys > * Custom user-defined secrets > Logging these secrets in plain text within the Flink JobManager or > TaskManager logs violates Kubernetes security best practices, which > explicitly discourage exposing sensitive environment variables in logs, and > poses a serious risk in production environments. > h4. *Proposed Fix* > * Redact known sensitive keys ({{{}SECRET{}}}, {{{}TOKEN{}}}, {{{}KEY{}}}, > {{{}PASSWORD{}}}, etc.) before logging. > Example fix snippet: > Map<String, String> redactedEnv = redactSensitive(environment); > LOG.info("Starting Python process with environment variables: {}", > redactedEnv);}} > * Consider an opt-in mechanism (e.g., {{{}log.python.env=true{}}}) for full > environment visibility in safe/test setups. > h4. *Steps to Reproduce* > # Set Kubernetes secrets as environment variables in a FlinkDeployment > (e.g., via {{{}envFrom.secretRef{}}}). > # Launch a PyFlink job using the Flink Kubernetes Operator. > # Examine the JobManager logs. > # Observe secrets printed via {{{}PythonEnvUtils.java{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)