jalengg commented on code in PR #38398: URL: https://github.com/apache/airflow/pull/38398#discussion_r1536565254
########## airflow/providers/google/cloud/hooks/gcs.py: ########## @@ -1006,6 +1006,27 @@ def get_md5hash(self, bucket_name: str, object_name: str) -> str: self.log.info("The md5Hash of %s is %s", object_name, blob_md5hash) return blob_md5hash + def get_metadata(self, bucket_name: str, object_name: str) -> dict | None: + """ + Get the metadata of an object in Google Cloud Storage. + + :param bucket_name: Name of the Google Cloud Storage bucket where the object is. + :param object_name: The name of the object containing the desired metadata + :return: The metadata associated with the object + """ + self.log.info("Retrieving the metadata dict of object (%s) in bucket (%s)", object_name, bucket_name) + client = self.get_conn() + bucket = client.bucket(bucket_name) + blob = bucket.get_blob(blob_name=object_name) + if blob is None: + raise ValueError("Object (%s) not found in bucket (%s)", object_name, bucket_name) Review Comment: My intention was to surface the error if the blob doesn't exist because it prevents a nonetype attribute error. It may be reasonable that the blob exists and the user will want to get metadata even if it's empty (for this return None), but in my opinion not reasonable for the user to supply an empty or non-existent blob to the function. So, I think it's appropriate to raise an exception and prevent the nonetype attribute error. The existing `get_blob_update_time` method in GCSHook handles the empty blob in the same way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org