[
https://issues.apache.org/jira/browse/FLINK-37932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated FLINK-37932:
-----------------------------------
Labels: filesystem gcs google observability pull-request-available (was:
filesystem gcs google observability)
> Set a Default User Agent for GCS FileSystem Connector for Better Observability
> ------------------------------------------------------------------------------
>
> Key: FLINK-37932
> URL: https://issues.apache.org/jira/browse/FLINK-37932
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / Hadoop Compatibility, FileSystems
> Reporter: Shruti Singhania
> Priority: Minor
> Labels: filesystem, gcs, google, observability,
> pull-request-available
>
> *1. Problem Statement*
> When Apache Flink interacts with Google Cloud Storage (GCS) via the
> {{flink-gs-fs-hadoop}} connector, the requests made to the GCS API do not
> contain any Flink-specific identifiers by default. While users can manually
> configure a user agent suffix via the {{fs.gs.user.agent.suffix}} property in
> {{{}core-site.xml{}}}, most users do not.
> This lack of a default identifier makes it difficult for users and cloud
> administrators to distinguish Flink-originated GCS traffic from other
> applications in GCP Cloud Audit Logs. This complicates monitoring, debugging,
> and cost attribution for Flink jobs operating in a cloud-native environment.
>
> *2. Proposed Solution*
> This proposal is to programmatically set a default user agent suffix within
> the Flink GCS filesystem factory. This suffix would be added _only if_ one is
> not already provided by the user in their configuration.
> The proposed default user agent suffix will include the Flink version, for
> example: {{{}Apache Flink 1.19.0{}}}.
> This provides a "sensible default" that enhances the experience for users
> running Flink on Google Cloud, while fully respecting any custom
> configuration.
>
> *3. Implementation Details*
> The change would be implemented in the {{flink-gs-fs-hadoop}} module.
> * *Module:* {{flink-gs-fs-hadoop}}
> * *Class:* {{org.apache.flink.fs.gs.GSFileSystemFactory}}
> * *Method:* {{create(URI fsUri)}}
> The implementation logic is as follows:
> # In the {{create}} method, after loading the Hadoop configuration, check if
> the {{fs.gs.user.agent.suffix}} key is already set.
> # If the key is not set (i.e., its value is {{{}null{}}}), programmatically
> set it on the {{Configuration}} object. The value should be dynamically
> generated using {{EnvironmentInformation.getVersion()}} to ensure version
> accuracy.
> # If the key is already set, do nothing, preserving the user's configuration.
> # Proceed with {{FileSystem}} instantiation using the (now guaranteed to be
> populated) {{Configuration}} object.
>
> *4. Impact on Users*
> * This is a {*}non-breaking change{*}.
> * Users who have already configured a custom {{fs.gs.user.agent.suffix}}
> will see no difference in behavior.
> * Users who have _not_ configured this property will automatically gain
> improved observability in their GCP logs without needing to make any changes.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)