ajamato commented on a change in pull request #14770:
URL: https://github.com/apache/beam/pull/14770#discussion_r629576291
##########
File path: sdks/python/apache_beam/io/gcp/gcsio.py
##########
@@ -586,7 +589,25 @@ def __init__(self, client, path, buffer_size):
auto_transfer=False,
chunksize=self._buffer_size,
num_retries=20)
- self._client.objects.Get(self._get_request, download=self._downloader)
+
+ # Create a request count metric
+ resource = resource_identifiers.GoogleCloudStorage(self._bucket)
+ labels = {
Review comment:
You may not have it initially, so perhaps you could just initially set
it blank. Then populate it once you have it (Or make a request to obtain it
before you make the initial requests, but that would be sacrificing some
performance, so I am not sure its a good idea)
After the first request you will have a copy of it on the response
https://cloud.google.com/storage/docs/json_api/v1/objects#resource
acls->projectTeam->projectNumber
Just make sure to save it somewhere so that its available again on the next
call (Please check if the object is destroyed, and you lose the reference).
@chamikaramj may have some other suggestions.
##########
File path: sdks/python/apache_beam/io/gcp/gcsio.py
##########
@@ -586,7 +589,25 @@ def __init__(self, client, path, buffer_size):
auto_transfer=False,
chunksize=self._buffer_size,
num_retries=20)
- self._client.objects.Get(self._get_request, download=self._downloader)
+
+ # Create a request count metric
+ resource = resource_identifiers.GoogleCloudStorage(self._bucket)
+ labels = {
+ monitoring_infos.SERVICE_LABEL: 'Storage',
+ monitoring_infos.METHOD_LABEL: 'GcsObjectsInsert',
+ monitoring_infos.RESOURCE_LABEL: resource,
+ monitoring_infos.GCS_BUCKET_LABEL: self._bucket,
+ }
+ service_call_metric = ServiceCallMetric(
+ request_count_urn=monitoring_infos.API_REQUEST_COUNT_URN,
+ base_labels=labels)
+
+ try:
+ response = self._client.objects.Get(
Review comment:
Please take a look at the public docs for the API to determine an answer
to this. I am not immediately sure. The response protos should be available
here.
I suspect this API will only return an http error code.
https://cloud.google.com/storage/docs/json_api/v1/status-codes
But please go through the API reference here to confirm. And make sure there
isn't an additional error on the body
https://cloud.google.com/storage/docs/json_api/v1/objects/get
The format of the response is here
https://cloud.google.com/storage/docs/json_api/v1/objects#resource
##########
File path: sdks/python/apache_beam/io/gcp/gcsio_test.py
##########
@@ -751,6 +755,28 @@ def test_mime_binary_encoding(self):
generator._handle_text(message)
self.assertEqual(test_msg.encode('ascii'), output_buffer.getvalue())
+ def test_monitoring_info(self):
+ file_name = 'gs://gcsio-metrics-test/dummy_mode_file'
+ bucket, _ = gcsio.parse_gcs_path(file_name)
+ resource = resource_identifiers.GoogleCloudStorage(bucket)
+ labels = {
+ monitoring_infos.SERVICE_LABEL: 'Storage',
+ monitoring_infos.METHOD_LABEL: 'Objects.insert',
+ monitoring_infos.RESOURCE_LABEL: resource,
+ monitoring_infos.GCS_BUCKET_LABEL: bucket,
+ monitoring_infos.STATUS_LABEL: 'ok'
+ }
+
+ with self.gcs.open(file_name, 'w') as f:
Review comment:
You can report that as the 'not_found' error/status code.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]