[ https://issues.apache.org/jira/browse/FLINK-37932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17980501#comment-17980501 ]
Chris Nauroth commented on FLINK-37932: --------------------------------------- +1 (non-binding) for the proposal. Minor nitpick: I think the configuration property you're referring to is {{{}fs.gs.application.name.suffix{}}}. [https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.28/gcs/CONFIGURATION.md#http-transport-configuration] > Set a Default User Agent for GCS FileSystem Connector for Better Observability > ------------------------------------------------------------------------------ > > Key: FLINK-37932 > URL: https://issues.apache.org/jira/browse/FLINK-37932 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hadoop Compatibility, FileSystems > Reporter: Shruti Singhania > Priority: Minor > Labels: filesystem, gcs, google, observability > > *1. Problem Statement* > When Apache Flink interacts with Google Cloud Storage (GCS) via the > {{flink-gs-fs-hadoop}} connector, the requests made to the GCS API do not > contain any Flink-specific identifiers by default. While users can manually > configure a user agent suffix via the {{fs.gs.user.agent.suffix}} property in > {{{}core-site.xml{}}}, most users do not. > This lack of a default identifier makes it difficult for users and cloud > administrators to distinguish Flink-originated GCS traffic from other > applications in GCP Cloud Audit Logs. This complicates monitoring, debugging, > and cost attribution for Flink jobs operating in a cloud-native environment. > > *2. Proposed Solution* > This proposal is to programmatically set a default user agent suffix within > the Flink GCS filesystem factory. This suffix would be added _only if_ one is > not already provided by the user in their configuration. > The proposed default user agent suffix will include the Flink version, for > example: {{{}Apache Flink 1.19.0{}}}. > This provides a "sensible default" that enhances the experience for users > running Flink on Google Cloud, while fully respecting any custom > configuration. > > *3. Implementation Details* > The change would be implemented in the {{flink-gs-fs-hadoop}} module. > * *Module:* {{flink-gs-fs-hadoop}} > * *Class:* {{org.apache.flink.fs.gs.GSFileSystemFactory}} > * *Method:* {{create(URI fsUri)}} > The implementation logic is as follows: > # In the {{create}} method, after loading the Hadoop configuration, check if > the {{fs.gs.user.agent.suffix}} key is already set. > # If the key is not set (i.e., its value is {{{}null{}}}), programmatically > set it on the {{Configuration}} object. The value should be dynamically > generated using {{EnvironmentInformation.getVersion()}} to ensure version > accuracy. > # If the key is already set, do nothing, preserving the user's configuration. > # Proceed with {{FileSystem}} instantiation using the (now guaranteed to be > populated) {{Configuration}} object. > > *4. Impact on Users* > * This is a {*}non-breaking change{*}. > * Users who have already configured a custom {{fs.gs.user.agent.suffix}} > will see no difference in behavior. > * Users who have _not_ configured this property will automatically gain > improved observability in their GCP logs without needing to make any changes. > -- This message was sent by Atlassian Jira (v8.20.10#820010)