ajamato commented on a change in pull request #14770:
URL: https://github.com/apache/beam/pull/14770#discussion_r629576291



##########
File path: sdks/python/apache_beam/io/gcp/gcsio.py
##########
@@ -586,7 +589,25 @@ def __init__(self, client, path, buffer_size):
         auto_transfer=False,
         chunksize=self._buffer_size,
         num_retries=20)
-    self._client.objects.Get(self._get_request, download=self._downloader)
+
+    # Create a request count metric
+    resource = resource_identifiers.GoogleCloudStorage(self._bucket)
+    labels = {

Review comment:
       You may not have it initially, so perhaps you could just initially set 
it blank. Then populate it once you have it (Or make a request to obtain it 
before you make the initial requests, but that would be sacrificing some 
performance, so I am not sure its a good idea)
   
   After the first request you will have a copy of it on the response
   https://cloud.google.com/storage/docs/json_api/v1/objects#resource
   acls->projectTeam->projectNumber
   
   Just make sure to save it somewhere so that its available again on the next 
call (Please check if the object is destroyed, and you lose the reference).
   
   @chamikaramj may have some other suggestions.

##########
File path: sdks/python/apache_beam/io/gcp/gcsio.py
##########
@@ -586,7 +589,25 @@ def __init__(self, client, path, buffer_size):
         auto_transfer=False,
         chunksize=self._buffer_size,
         num_retries=20)
-    self._client.objects.Get(self._get_request, download=self._downloader)
+
+    # Create a request count metric
+    resource = resource_identifiers.GoogleCloudStorage(self._bucket)
+    labels = {
+        monitoring_infos.SERVICE_LABEL: 'Storage',
+        monitoring_infos.METHOD_LABEL: 'GcsObjectsInsert',
+        monitoring_infos.RESOURCE_LABEL: resource,
+        monitoring_infos.GCS_BUCKET_LABEL: self._bucket,
+    }
+    service_call_metric = ServiceCallMetric(
+        request_count_urn=monitoring_infos.API_REQUEST_COUNT_URN,
+        base_labels=labels)
+
+    try:
+      response = self._client.objects.Get(

Review comment:
       Please take a look at the public docs for the API to determine an answer 
to this. I am not immediately sure. The response protos should be available 
here.
   I suspect this API will only return an http error code.
   https://cloud.google.com/storage/docs/json_api/v1/status-codes
   
   But please go through the API reference here to confirm. And make sure there 
isn't an additional error on the body
   https://cloud.google.com/storage/docs/json_api/v1/objects/get
   
   The format of the response is here
   https://cloud.google.com/storage/docs/json_api/v1/objects#resource
   
   

##########
File path: sdks/python/apache_beam/io/gcp/gcsio_test.py
##########
@@ -751,6 +755,28 @@ def test_mime_binary_encoding(self):
     generator._handle_text(message)
     self.assertEqual(test_msg.encode('ascii'), output_buffer.getvalue())
 
+  def test_monitoring_info(self):
+    file_name = 'gs://gcsio-metrics-test/dummy_mode_file'
+    bucket, _ = gcsio.parse_gcs_path(file_name)
+    resource = resource_identifiers.GoogleCloudStorage(bucket)
+    labels = {
+        monitoring_infos.SERVICE_LABEL: 'Storage',
+        monitoring_infos.METHOD_LABEL: 'Objects.insert',
+        monitoring_infos.RESOURCE_LABEL: resource,
+        monitoring_infos.GCS_BUCKET_LABEL: bucket,
+        monitoring_infos.STATUS_LABEL: 'ok'
+    }
+
+    with self.gcs.open(file_name, 'w') as f:

Review comment:
       You can report that as the 'not_found' error/status code.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to