GitHub user JhonGarcia0 created a discussion: š§ Suggestion / Hook Improvement:
GoogleDriveHook.get_file_id
Hi team! I'm new around this channel, but I've been using Apache Airflow for
quite some time now. Today I ran into an issue that drove me a bit crazy, and I
wanted to share a suggestion that might improve the developer experience and
functionality of the GoogleDriveHook, specifically the get_file_id method in:
airflow.providers.google.suite.hooks.drive
When using a service account to search for files, I was unable to find a file
that I knew existed. After digging for quite a while, I discovered two small
(but important) behaviors that caused the issue.
1. Query Order Matters in parents in Clause
Current code:
```
if folder_id:
query += f" and parents in '{folder_id}'"
```
This syntax caused my service account to never find the file, even though it
had access. After trying different formats and reviewing the Drive API
documentation, I realized the correct (and working) format is:
```
if folder_id:
query += f" and '{folder_id}' in parents"
```
This small change resolved the issue immediately. While subtle, the order in
the query clause is critical when searching by parent folders.
2. Missing Support for Shared Drives (Drive Shared With Me)
Another issue arises when searching for files that are shared with the service
account, but not in the service account's personal Drive or a shared drive
(i.e., "Shared with me").
Currently, the code does:
```
else:
files = (
service.files()
.list(
q=query,
spaces="drive",
fields="files(id, mimeType)",
orderBy="modifiedTime desc"
)
.execute(num_retries=self.num_retries)
)
```
But this only searches within the user's personal Drive (or within a shared
drive if explicitly passed). However, when files are shared directly with the
service account (without a driveId), they are not found by default.
To support this use case, I propose adding a new flag ā for example,
`shared_folder`, or another clearly named parameter ā to explicitly enable
searching in **shared folders** (i.e., files shared with the service account
but not located in its own Drive or in a specific shared drive).
When this flag is set to `True`, the following parameters should be included in
the `files().list()` call to ensure the Drive API can search across all
available drives:
```
includeItemsFromAllDrives=True,
supportsAllDrives=True
```
This enhancement would make the hook more robust and flexible, especially for
users relying on service accounts accessing files shared from external sources.
Thank you for your attention, and I hope this suggestion can be taken into
consideration.
If you need any additional information or clarification, Iām happy to help and
available to answer any questions this might raise.
GitHub link: https://github.com/apache/airflow/discussions/56487
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]