Shruti Singhania created FLINK-37932: ----------------------------------------
Summary: Set a Default User Agent for GCS FileSystem Connector for Better Observability Key: FLINK-37932 URL: https://issues.apache.org/jira/browse/FLINK-37932 Project: Flink Issue Type: Improvement Components: Connectors / Hadoop Compatibility, FileSystems Reporter: Shruti Singhania *1. Problem Statement* When Apache Flink interacts with Google Cloud Storage (GCS) via the {{flink-gs-fs-hadoop}} connector, the requests made to the GCS API do not contain any Flink-specific identifiers by default. While users can manually configure a user agent suffix via the {{fs.gs.user.agent.suffix}} property in {{{}core-site.xml{}}}, most users do not. This lack of a default identifier makes it difficult for users and cloud administrators to distinguish Flink-originated GCS traffic from other applications in GCP Cloud Audit Logs. This complicates monitoring, debugging, and cost attribution for Flink jobs operating in a cloud-native environment. *2. Proposed Solution* This proposal is to programmatically set a default user agent suffix within the Flink GCS filesystem factory. This suffix would be added _only if_ one is not already provided by the user in their configuration. The proposed default user agent suffix will include the Flink version, for example: {{{}Apache Flink 1.19.0{}}}. This provides a "sensible default" that enhances the experience for users running Flink on Google Cloud, while fully respecting any custom configuration. *3. Implementation Details* The change would be implemented in the {{flink-gs-fs-hadoop}} module. * *Module:* {{flink-gs-fs-hadoop}} * *Class:* {{org.apache.flink.fs.gs.GSFileSystemFactory}} * *Method:* {{create(URI fsUri)}} The implementation logic is as follows: # In the {{create}} method, after loading the Hadoop configuration, check if the {{fs.gs.user.agent.suffix}} key is already set. # If the key is not set (i.e., its value is {{{}null{}}}), programmatically set it on the {{Configuration}} object. The value should be dynamically generated using {{EnvironmentInformation.getVersion()}} to ensure version accuracy. # If the key is already set, do nothing, preserving the user's configuration. # Proceed with {{FileSystem}} instantiation using the (now guaranteed to be populated) {{Configuration}} object. *4. Impact on Users* * This is a {*}non-breaking change{*}. * Users who have already configured a custom {{fs.gs.user.agent.suffix}} will see no difference in behavior. * Users who have _not_ configured this property will automatically gain improved observability in their GCP logs without needing to make any changes. -- This message was sent by Atlassian Jira (v8.20.10#820010)