GitHub user JhonGarcia0 created a discussion: šŸ”§ Suggestion / Hook Improvement: 
GoogleDriveHook.get_file_id

Hi team! I'm new around this channel, but I've been using Apache Airflow for 
quite some time now. Today I ran into an issue that drove me a bit crazy, and I 
wanted to share a suggestion that might improve the developer experience and 
functionality of the GoogleDriveHook, specifically the get_file_id method in:

airflow.providers.google.suite.hooks.drive

When using a service account to search for files, I was unable to find a file 
that I knew existed. After digging for quite a while, I discovered two small 
(but important) behaviors that caused the issue.

1. Query Order Matters in parents in Clause

Current code:

```
if folder_id:
    query += f" and parents in '{folder_id}'"
```

This syntax caused my service account to never find the file, even though it 
had access. After trying different formats and reviewing the Drive API 
documentation, I realized the correct (and working) format is:

```
if folder_id:
    query += f" and '{folder_id}' in parents"
```

This small change resolved the issue immediately. While subtle, the order in 
the query clause is critical when searching by parent folders.

2. Missing Support for Shared Drives (Drive Shared With Me)

Another issue arises when searching for files that are shared with the service 
account, but not in the service account's personal Drive or a shared drive 
(i.e., "Shared with me").

Currently, the code does:
```
else:
  files = (
      service.files()
      .list(
          q=query,
          spaces="drive",
          fields="files(id, mimeType)",
          orderBy="modifiedTime desc"
      )
      .execute(num_retries=self.num_retries)
  )
```

But this only searches within the user's personal Drive (or within a shared 
drive if explicitly passed). However, when files are shared directly with the 
service account (without a driveId), they are not found by default.

To support this use case, I propose adding a new flag — for example, 
`shared_folder`, or another clearly named parameter — to explicitly enable 
searching in **shared folders** (i.e., files shared with the service account 
but not located in its own Drive or in a specific shared drive).

When this flag is set to `True`, the following parameters should be included in 
the `files().list()` call to ensure the Drive API can search across all 
available drives:

```
includeItemsFromAllDrives=True,
supportsAllDrives=True
```

This enhancement would make the hook more robust and flexible, especially for 
users relying on service accounts accessing files shared from external sources.


Thank you for your attention, and I hope this suggestion can be taken into 
consideration.
If you need any additional information or clarification, I’m happy to help and 
available to answer any questions this might raise.


GitHub link: https://github.com/apache/airflow/discussions/56487

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to