Shruti Singhania created FLINK-37932:
----------------------------------------

             Summary: Set a Default User Agent for GCS FileSystem Connector for 
Better Observability
                 Key: FLINK-37932
                 URL: https://issues.apache.org/jira/browse/FLINK-37932
             Project: Flink
          Issue Type: Improvement
          Components: Connectors / Hadoop Compatibility, FileSystems
            Reporter: Shruti Singhania


*1. Problem Statement*

When Apache Flink interacts with Google Cloud Storage (GCS) via the 
{{flink-gs-fs-hadoop}} connector, the requests made to the GCS API do not 
contain any Flink-specific identifiers by default. While users can manually 
configure a user agent suffix via the {{fs.gs.user.agent.suffix}} property in 
{{{}core-site.xml{}}}, most users do not.

This lack of a default identifier makes it difficult for users and cloud 
administrators to distinguish Flink-originated GCS traffic from other 
applications in GCP Cloud Audit Logs. This complicates monitoring, debugging, 
and cost attribution for Flink jobs operating in a cloud-native environment.

 

*2. Proposed Solution*

This proposal is to programmatically set a default user agent suffix within the 
Flink GCS filesystem factory. This suffix would be added _only if_ one is not 
already provided by the user in their configuration.

The proposed default user agent suffix will include the Flink version, for 
example: {{{}Apache Flink 1.19.0{}}}.

This provides a "sensible default" that enhances the experience for users 
running Flink on Google Cloud, while fully respecting any custom configuration.

 

*3. Implementation Details*

The change would be implemented in the {{flink-gs-fs-hadoop}} module.
 * *Module:* {{flink-gs-fs-hadoop}}
 * *Class:* {{org.apache.flink.fs.gs.GSFileSystemFactory}}
 * *Method:* {{create(URI fsUri)}}

The implementation logic is as follows:
 # In the {{create}} method, after loading the Hadoop configuration, check if 
the {{fs.gs.user.agent.suffix}} key is already set.
 # If the key is not set (i.e., its value is {{{}null{}}}), programmatically 
set it on the {{Configuration}} object. The value should be dynamically 
generated using {{EnvironmentInformation.getVersion()}} to ensure version 
accuracy.
 # If the key is already set, do nothing, preserving the user's configuration.
 # Proceed with {{FileSystem}} instantiation using the (now guaranteed to be 
populated) {{Configuration}} object.

 

*4. Impact on Users*
 * This is a {*}non-breaking change{*}.
 * Users who have already configured a custom {{fs.gs.user.agent.suffix}} will 
see no difference in behavior.
 * Users who have _not_ configured this property will automatically gain 
improved observability in their GCP logs without needing to make any changes.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to