[ 
https://issues.apache.org/jira/browse/FLINK-37932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17980501#comment-17980501
 ] 

Chris Nauroth commented on FLINK-37932:
---------------------------------------

+1 (non-binding) for the proposal. Minor nitpick: I think the configuration 
property you're referring to is {{{}fs.gs.application.name.suffix{}}}.

 

[https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.28/gcs/CONFIGURATION.md#http-transport-configuration]

 

> Set a Default User Agent for GCS FileSystem Connector for Better Observability
> ------------------------------------------------------------------------------
>
>                 Key: FLINK-37932
>                 URL: https://issues.apache.org/jira/browse/FLINK-37932
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / Hadoop Compatibility, FileSystems
>            Reporter: Shruti Singhania
>            Priority: Minor
>              Labels: filesystem, gcs, google, observability
>
> *1. Problem Statement*
> When Apache Flink interacts with Google Cloud Storage (GCS) via the 
> {{flink-gs-fs-hadoop}} connector, the requests made to the GCS API do not 
> contain any Flink-specific identifiers by default. While users can manually 
> configure a user agent suffix via the {{fs.gs.user.agent.suffix}} property in 
> {{{}core-site.xml{}}}, most users do not.
> This lack of a default identifier makes it difficult for users and cloud 
> administrators to distinguish Flink-originated GCS traffic from other 
> applications in GCP Cloud Audit Logs. This complicates monitoring, debugging, 
> and cost attribution for Flink jobs operating in a cloud-native environment.
>  
> *2. Proposed Solution*
> This proposal is to programmatically set a default user agent suffix within 
> the Flink GCS filesystem factory. This suffix would be added _only if_ one is 
> not already provided by the user in their configuration.
> The proposed default user agent suffix will include the Flink version, for 
> example: {{{}Apache Flink 1.19.0{}}}.
> This provides a "sensible default" that enhances the experience for users 
> running Flink on Google Cloud, while fully respecting any custom 
> configuration.
>  
> *3. Implementation Details*
> The change would be implemented in the {{flink-gs-fs-hadoop}} module.
>  * *Module:* {{flink-gs-fs-hadoop}}
>  * *Class:* {{org.apache.flink.fs.gs.GSFileSystemFactory}}
>  * *Method:* {{create(URI fsUri)}}
> The implementation logic is as follows:
>  # In the {{create}} method, after loading the Hadoop configuration, check if 
> the {{fs.gs.user.agent.suffix}} key is already set.
>  # If the key is not set (i.e., its value is {{{}null{}}}), programmatically 
> set it on the {{Configuration}} object. The value should be dynamically 
> generated using {{EnvironmentInformation.getVersion()}} to ensure version 
> accuracy.
>  # If the key is already set, do nothing, preserving the user's configuration.
>  # Proceed with {{FileSystem}} instantiation using the (now guaranteed to be 
> populated) {{Configuration}} object.
>  
> *4. Impact on Users*
>  * This is a {*}non-breaking change{*}.
>  * Users who have already configured a custom {{fs.gs.user.agent.suffix}} 
> will see no difference in behavior.
>  * Users who have _not_ configured this property will automatically gain 
> improved observability in their GCP logs without needing to make any changes.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to