shunping-google commented on issue #26644: URL: https://github.com/apache/beam/issues/26644#issuecomment-1553358649
The Beam Java APIs to perform operations on GCS are defined in `org.apache.beam.sdk.extensions.gcp.util.GcsUtil.java`. Specifically, the private instance variable `storageClient` of the `Storage` class [(code link)](https://github.com/apache/beam/blob/release-2.47.0/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java#L166) is the GCS client API provider. For different GCS operations, storageClient internally constructs the HTTP requests accordingly and send them to GCS for execution. The user agent in the HTTP header is assembled when an HTTP request is constructed. There are two code paths of invoking GCS APIs in GcsUtil: * One code path of invoking GCS APIs can be seen in the public method [getObject](https://github.com/apache/beam/blob/release-2.47.0/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java#L336) in GcsUtil. It constructs the Get operation request in the following chain: `com.google.api.services.storage.Storage.Objects.Get` –> `com.google.api.services.storage.StorageRequest` –> `com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest` –> `com.google.api.client.googleapis.services.AbstractGoogleClientRequest` Inside `AbstractGoogleClientRequest`, the `applicationName` is included as a prefix in the user agent string[(code link)](https://github.com/googleapis/google-api-java-client/blob/v2.2.0/google-api-client/src/main/java/com/google/api/client/googleapis/services/AbstractGoogleClientRequest.java#L131). * A slightly different code path is in the public method [SeekableByteChannel](https://github.com/apache/beam/blob/release-2.47.0/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java#L463) in GcsUtil, where the public method [open](https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.6/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java#L155) of the GoogleCloudStorage is called. However, it also internally calls the GCS client API provider previously mentioned to construct HTTP requests: [code link 1](https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.6/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorageImpl.java#L804), [code link 2](https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.6/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorageReadChannel.java#L1125), [code link 3](https://github.com/GoogleCloudDataproc/hadoop- connectors/blob/v2.2.6/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/StorageRequestFactory.java#L40). * Notice in either code path, `applicationName` under the `Storage` class object is the prefix of the user agent string. Therefore, to add the requested string "(GPN:Beam)” in the user agent, I propose to prepend it to the original applicationName when creating the builder object of the `Storage` class in [Transport](https://github.com/apache/beam/blob/release-2.47.0/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/Transport.java#L104). Here is an example user agent string before and after the change. Before: "TransportTest Google-API-Java-Client/2.0.0" After: "**(GPN:Beam)** TransportTest Google-API-Java-Client/2.0.0" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
