dimberman commented on code in PR #29694:
URL: https://github.com/apache/airflow/pull/29694#discussion_r1117188933


##########
airflow/providers/google/suite/hooks/drive.py:
##########
@@ -159,6 +160,46 @@ def exists(
             )
         )
 
+    def _resolve_file_path(self, file_id: str) -> str:
+        """
+        Returns the full Google Drive path for given file_id
+
+        :param file_id: The id of a file in Google Drive
+        :return: Google Drive full path for a file
+        """
+        MAX_NESTED_FOLDERS_LEVEL = 20  # Link to docs 
https://support.google.com/a/users/answer/7338880?hl=en
+        iteration = 1
+
+        service = self.get_conn()
+        has_reached_root = False
+        _file_id = file_id
+        path: str = ""
+        while not has_reached_root:
+            file_info = (
+                service.files()
+                .get(
+                    fileId=_file_id,
+                    fields="id,name,parents",
+                    supportsAllDrives=True,
+                )
+                .execute()
+            )
+            if "parents" in file_info:
+                parent_directories = file_info["parents"]
+                path = f'{file_info["name"]}' if path == "" else 
f'{file_info["name"]}/{path}'
+
+                if len(parent_directories) > 1:
+                    self.log.warning("Google returned multiple parents, 
picking first")
+                _file_id = parent_directories[0]
+            else:
+                has_reached_root = True
+                path = f'{file_info["name"]}/{path}'

Review Comment:
   This code section is somewhat hard to read. Could you please either a) break 
this out into helper functions, b) write in some comments explaining exactly 
what's Happening Here, or c) both?



##########
airflow/providers/google/suite/hooks/drive.py:
##########
@@ -243,8 +284,13 @@ def upload_file(
             .create(body=file_metadata, media_body=media, fields="id", 
supportsAllDrives=True)
             .execute(num_retries=self.num_retries)
         )
-        self.log.info("File %s uploaded to gdrive://%s.", local_location, 
remote_location)
-        return file.get("id")
+        file_id = file.get("id")
+
+        upload_location = (

Review Comment:
   Well two questions about this:
   
   1. In a scenario where user only has permission to look at part of the tree, 
would the API throw an error or would we just not be able to see it at all?
   
   2. Are there people who have such deep traversal trees in their Google Drive 
that this would be a large amount of time? How common is that?



##########
airflow/providers/google/suite/hooks/drive.py:
##########
@@ -243,8 +284,13 @@ def upload_file(
             .create(body=file_metadata, media_body=media, fields="id", 
supportsAllDrives=True)
             .execute(num_retries=self.num_retries)
         )
-        self.log.info("File %s uploaded to gdrive://%s.", local_location, 
remote_location)
-        return file.get("id")
+        file_id = file.get("id")
+
+        upload_location = (

Review Comment:
   I think it would be worth us investigating whether this is actually a 
scenario that could happen in Google drive as I have a feeling that the Google 
drive filesystem is probably closer to an s3 than a POSIX



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to