ajamato commented on pull request #14770: URL: https://github.com/apache/beam/pull/14770#issuecomment-843651996
I am not sure why but this doesn't show up in my github mentions or reviews. Please DM me if you need me to look at the PR > Should all the`self._client.objects.Get()` calls be added to the metrics or just the ones pointed in the document? Like this [function](https://github.com/apache/beam/blob/309fc99a8a94a8dc42a4e817002cc084da5a2811/sdks/python/apache_beam/io/gcp/gcsio.py#L593), that also makes this request and it's not pointed in the document. Anywhere the GCS IO reads and writes objects to GCS needs instrumentation. I don't think I identified all the locations, please see if you can locate them all. > In the implementation guide, a reference is made to the `GcsUtil.java.getObject`[1]. However, I'm not sure if the metrics should be added in the python's code equivalent (which I think is this[2]) or in this specific piece of code[1]. > > [1] [GcsUtil.java#L286](https://github.com/apache/beam/blob/3bb232fb098700de408f574585dfe74bbaff7230/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java#L286) > [1] [gcsio.py#L613](https://github.com/apache/beam/blob/920553e8f2743d2709b786c16a2f916a2a8c9389/sdks/python/apache_beam/io/gcp/gcsio.py#L613) Which ever is the appropriate code that GCSIO used to read and write objects to GCS. I would run a pipeline on direct runner, and add logging to identify. Look if there are extra modes that might cause it to use a different code path as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
