molcay commented on code in PR #48107:
URL: https://github.com/apache/airflow/pull/48107#discussion_r2022960988
##########
providers/google/src/airflow/providers/google/cloud/hooks/gcs.py:
##########
@@ -726,6 +726,14 @@ def delete(self, bucket_name: str, object_name: str) ->
None:
self.log.info("Blob %s deleted.", object_name)
+ def get_bucket(self, bucket_name: str) -> storage.Bucket:
Review Comment:
This method makes sense. But when I check the rest of this file, I realize
that we are using the following snippet repeatedly:
```python
client = self.get_conn()
client.bucket(bucket_name)
```
I also see that some places, we are using the `client.bucket` like the
following
```python
client.bucket(bucket_name, user_project=user_project)
```
where `user_project` is a parameter of the function:
> :param user_project: The identifier of the Google Cloud project to bill
for the request. Required for Requester Pays buckets.
Do you think that we can add this optional parameter (`user_project`) to
this function?
##########
providers/google/src/airflow/providers/google/cloud/transfers/sftp_to_gcs.py:
##########
@@ -166,16 +173,22 @@ def _copy_single_object(
destination_object,
)
- with NamedTemporaryFile("w") as tmp:
- sftp_hook.retrieve_file(source_path, tmp.name,
prefetch=self.sftp_prefetch)
-
- gcs_hook.upload(
- bucket_name=self.destination_bucket,
- object_name=destination_object,
- filename=tmp.name,
- mime_type=self.mime_type,
- gzip=self.gzip,
- )
+ if self.use_stream:
+ dest_bucket = gcs_hook.get_bucket(self.destination_bucket)
+ dest_blob = dest_bucket.blob(destination_object)
Review Comment:
Since we are calling the `get_bucket` method and then calling the `.blob`
method on the bucket object, do you agree that we can create a method called
`blob` on the hook side to return the blob directly like the following:
```python
# hooks/gcs.py
def blob(self, bucket_name: str, object_name: str, user_project: str |
None = None):
bucket = self.get_bucket(self.destination_bucket, user_project)
return bucket.blob(object_name)
```
And then we can call it like the following here:
```suggestion
dest_blob = gcs_hook.blob(self.destination_bucket,
destination_object)
```
Also, if we implement the `blob` method in the GCSHook class then we do not
need to add the public `get_bucket` method.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]