atrbgithub opened a new issue, #34909:
URL: https://github.com/apache/airflow/issues/34909
### Apache Airflow version
Other Airflow 2 version (please specify below)
### What happened
This affects Airflow 2.7.2. It appears that the 10.9.0 version of
apache-airflow-providers-google fails to list objects in gcs.
Example to recreate:
```shell
pipenv --python 3.8
pipenv shell
pip install apache-airflow==2.7.2 apache-airflow-providers-google==10.9.0
export AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT='google-cloud-platform://'
```
Then create the following python test file:
```python
from airflow.providers.google.cloud.hooks.gcs import GCSHook
result = GCSHook().list(
bucket_name='a-test-bucket,
prefix="a/test/prefix",
delimiter='.csv'
)
result = list(result)
print(result)
```
The output if this is:
```
[]
```
In a different pipenv environment, this works when using Airflow 2.7.1 and
the 10.7.0 version of the provider:
```shell
pipenv --python 3.8
pipenv shell
pip install apache-airflow==2.7.1 apache-airflow-providers-google==10.7.0
export AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT='google-cloud-platform://'
```
Use the same python test file as above. The output of this is a list of
files as expected.
[this](https://github.com/apache/airflow/commit/3fa9d46ec74ef8453fcf17fbd49280cb6fb37cef#diff-82854006b5553665046db26d43a9dfa90bec78d4ba93e2d2ca7ff5bf632fa624R832)
appears to be the commit which may have broken things.
The `hooks/gcs.py` file can be patched in the following way which appears to
force the lazy loading to kick in:
```python
print("Forcing loading....")
all_blobs = list(blobs)
for blob in all_blobs:
print(blob.name)
if blobs.prefixes:
ids.extend(blobs.prefixes)
else:
ids.extend(blob.name for blob in all_blobs)
page_token = blobs.next_page_token
if page_token is None:
# empty next page token
break
```
Example patch file:
```
+++ gcs.py 2023-10-12 11:34:00.774206013 +0000
@@ -829,12 +829,19 @@
versions=versions,
)
+ print("Forcing loading....")
+ all_blobs = list(blobs)
+
+ for blob in all_blobs:
+ print(blob.name)
+
if blobs.prefixes:
ids.extend(blobs.prefixes)
else:
- ids.extend(blob.name for blob in blobs)
+ ids.extend(blob.name for blob in all_blobs)
page_token = blobs.next_page_token
+
if page_token is None:
# empty next page token
break
```
### What you think should happen instead
The provider should be able to list files in gcs.
### How to reproduce
Please see above for the steps to reproduce.
### Operating System
n/a
### Versions of Apache Airflow Providers
10.9.0 of the google provider.
### Deployment
Other 3rd-party Helm chart
### Deployment details
_No response_
### Anything else
_No response_
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]