jalengg commented on code in PR #38398:
URL: https://github.com/apache/airflow/pull/38398#discussion_r1550939930
##########
airflow/providers/google/cloud/hooks/gcs.py:
##########
@@ -1006,6 +1006,27 @@ def get_md5hash(self, bucket_name: str, object_name:
str) -> str:
self.log.info("The md5Hash of %s is %s", object_name, blob_md5hash)
return blob_md5hash
+ def get_metadata(self, bucket_name: str, object_name: str) -> dict | None:
+ """
+ Get the metadata of an object in Google Cloud Storage.
+
+ :param bucket_name: Name of the Google Cloud Storage bucket where the
object is.
+ :param object_name: The name of the object containing the desired
metadata
+ :return: The metadata associated with the object
+ """
+ self.log.info("Retrieving the metadata dict of object (%s) in bucket
(%s)", object_name, bucket_name)
+ client = self.get_conn()
+ bucket = client.bucket(bucket_name)
+ blob = bucket.get_blob(blob_name=object_name)
+ if blob is None:
+ raise ValueError("Object (%s) not found in bucket (%s)",
object_name, bucket_name)
+ blob_metadata = blob.metadata
+ if blob_metadata:
+ self.log.info("Retrieved metadata of object (%s) with %s fields",
object_name, len(blob_metadata))
+ else:
+ self.log.info("Metadata of object (%s) is empty or it does not
exist", object_name)
Review Comment:
@eladkal If I understand correctly, it seems you're concerned about
returning None when the metadata is empty, and what happens if the blob doesn't
exist. In any case, this is the desired behavior. It is possible for an
existing blob to have empty metadata. This is documented in [GCS python lib
reference](https://cloud.google.com/python/docs/reference/storage/latest/google.cloud.storage.blob.Blob#google_cloud_storage_blob_Blob_metadata).
- Case 1: blob is `None`, so invoking `blob.metadata` will result in a
NoneType Attribute error. To avoid this, I added the ValueError in L1022.
- Case 2: blob exists, blob.metadata exists, return blob.metadata (Happy
Path)
- Case 3: blob exists, blob.metadata is `None`, return None. This is the
desired behavior because the user might not know if the blob has metadata and
might be using `get_metadata` to be checking for its existence, so we treat
`None` as an acceptable value for `blob.metadata`
In fact, the only difference in the behavior between an existing
`blob.metadata` and an empty `blob.metadata` (assuming `blob` exists) is in the
logging. When it exists, we log the number of fields found, and when it's
empty, we log it as such but return None without raising an error.
`blob.metadata # None` is a fundamentally different case from `blob # None`.
`None` is not an acceptable value for `blob`, but `blob.metadata` is allowed to
be `None`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]