josh-fell commented on code in PR #29694:
URL: https://github.com/apache/airflow/pull/29694#discussion_r1123963291
##########
airflow/providers/google/suite/hooks/drive.py:
##########
@@ -243,8 +300,21 @@ def upload_file(
.create(body=file_metadata, media_body=media, fields="id",
supportsAllDrives=True)
.execute(num_retries=self.num_retries)
)
- self.log.info("File %s uploaded to gdrive://%s.", local_location,
remote_location)
- return file.get("id")
+ file_id = file.get("id")
+
+ upload_location = remote_location
+
+ if folder_id != "root":
+ try:
+ upload_location = self._resolve_file_path(folder_id)
+ except (GoogleApiClientError, AirflowException) as e:
Review Comment:
I agree with @aru-trackunit. I don't think the task should fail simply
because the file path can't be resolved for logging purposes. It's unfortunate
the Google API doesn't surface this information easily. As long as the task
does what it's supposed to do, Airflow shouldn't care what is in its
user-facing task logs. Having this logic configurable also helps too.
I do agree with @eladkal though that `AirflowException` is irrelevant. The
operator is not failing for some orchestration or execution-related issue (IMO
I think we use `AirflowException` with too broad of a brush sometimes). Maybe a
custom exception is raised (e.g. `NestedFolderLimitReachedException` or
something similar) just in case some real AirflowException is swallowed
unknowingly?
##########
airflow/providers/google/suite/hooks/drive.py:
##########
@@ -159,6 +161,59 @@ def exists(
)
)
+ def _get_file_info(self, file_id: str):
+ """
+ Returns Google API file_info object containing id, name, parents in
the response
+ https://developers.google.com/drive/api/v3/reference/files/get
+
+ :param file_id: id as string representation of interested file
+ :return: file
+ """
+ file_info = (
+ self.get_conn()
+ .files()
+ .get(
+ fileId=file_id,
+ fields="id,name,parents",
+ supportsAllDrives=True,
+ )
+ .execute(num_retries=2)
+ )
+ return file_info
+
+ def _resolve_file_path(self, file_id: str) -> str:
+ """
+ Returns the full Google Drive path for given file_id
+
+ :param file_id: The id of a file in Google Drive
+ :return: Google Drive full path for a file
+ """
+ MAX_NESTED_FOLDERS_LEVEL = 20 # Link to docs
https://support.google.com/a/users/answer/7338880?hl=en
Review Comment:
This still feels magical. But, out of curiosity, if the documentation states
"A folder in a shared drive can have up to 20 levels of nested folders", does
that mean users _can't_ physically create more than that number of nested
folders or is it really a _shouldn't_ create situation?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]