shunping-google commented on issue #26644:
URL: https://github.com/apache/beam/issues/26644#issuecomment-1553358649

   The Beam Java APIs to perform operations on GCS are defined in 
`org.apache.beam.sdk.extensions.gcp.util.GcsUtil.java`. Specifically, the 
private instance variable `storageClient` of the `Storage` class [(code 
link)](https://github.com/apache/beam/blob/release-2.47.0/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java#L166)
 is the GCS client API provider. For different GCS operations, storageClient 
internally constructs the HTTP requests accordingly and send them to GCS for 
execution. The user agent in the HTTP header is assembled when an HTTP request 
is constructed.
   
   There are two code paths of invoking GCS APIs in GcsUtil:
   * One code path of invoking GCS APIs can be seen in the public method 
[getObject](https://github.com/apache/beam/blob/release-2.47.0/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java#L336)
 in GcsUtil.  It constructs the Get operation request in the following chain:
   
     `com.google.api.services.storage.Storage.Objects.Get`
      –> `com.google.api.services.storage.StorageRequest`
      –> 
`com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest`
      –> `com.google.api.client.googleapis.services.AbstractGoogleClientRequest`
   
     Inside `AbstractGoogleClientRequest`, the `applicationName` is included as 
a prefix in the user agent string[(code 
link)](https://github.com/googleapis/google-api-java-client/blob/v2.2.0/google-api-client/src/main/java/com/google/api/client/googleapis/services/AbstractGoogleClientRequest.java#L131).
   
   * A slightly different code path is in the public method 
[SeekableByteChannel](https://github.com/apache/beam/blob/release-2.47.0/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java#L463)
 in GcsUtil, where the public method 
[open](https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.6/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java#L155)
 of the GoogleCloudStorage is called. However, it also internally calls the GCS 
client API provider previously mentioned to construct HTTP requests: [code link 
1](https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.6/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorageImpl.java#L804),
 [code link 
2](https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.6/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorageReadChannel.java#L1125),
 [code link 3](https://github.com/GoogleCloudDataproc/hadoop-
 
connectors/blob/v2.2.6/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/StorageRequestFactory.java#L40).
   
   * Notice in either code path, `applicationName` under the `Storage` class 
object is the prefix of the user agent string. 
   
   Therefore, to add the requested string "(GPN:Beam)” in the user agent, I 
propose to prepend it to the original applicationName when creating the builder 
object of the `Storage` class in 
[Transport](https://github.com/apache/beam/blob/release-2.47.0/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/Transport.java#L104).
   
   Here is an example user agent string before and after the change.
   
   Before:
   "TransportTest Google-API-Java-Client/2.0.0"
   
   After:
   "**(GPN:Beam)** TransportTest Google-API-Java-Client/2.0.0"
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to