ajamato commented on pull request #14770:
URL: https://github.com/apache/beam/pull/14770#issuecomment-843651996


   I am not sure why but this doesn't show up in my github mentions or reviews.
   
   Please DM me if you need me to look at the PR
   
   
   > Should all the`self._client.objects.Get()` calls be added to the metrics 
or just the ones pointed in the document? Like this 
[function](https://github.com/apache/beam/blob/309fc99a8a94a8dc42a4e817002cc084da5a2811/sdks/python/apache_beam/io/gcp/gcsio.py#L593),
 that also makes this request and it's not pointed in the document.
   
   Anywhere the GCS IO reads and writes objects to GCS needs instrumentation. I 
don't think I identified all the locations, please see if you can locate them 
all.
   
   > In the implementation guide, a reference is made to the 
`GcsUtil.java.getObject`[1]. However, I'm not sure if the metrics should be 
added in the python's code equivalent (which I think is this[2]) or in this 
specific piece of code[1].
   > 
   > [1] 
[GcsUtil.java#L286](https://github.com/apache/beam/blob/3bb232fb098700de408f574585dfe74bbaff7230/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java#L286)
   > [1] 
[gcsio.py#L613](https://github.com/apache/beam/blob/920553e8f2743d2709b786c16a2f916a2a8c9389/sdks/python/apache_beam/io/gcp/gcsio.py#L613)
   
   Which ever is the appropriate code that GCSIO used to read and write objects 
to GCS.
   I would run a pipeline on direct runner, and add logging to identify.
   Look if there are extra modes that might cause it to use a different code 
path as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to