nyoungstudios opened a new issue, #37576:
URL: https://github.com/apache/airflow/issues/37576
### Apache Airflow Provider(s)
google
### Versions of Apache Airflow Providers
apache-airflow-providers-google==10.10.0
### Apache Airflow version
2.6.3
### Operating System
Debian 11
### Deployment
Google Cloud Composer
### Deployment details
Reproducible locally in our Dockerfile based with Python environment
installed with conda in our VS Code dev container. Local executor, Postgres
database. Same error in our Google Cloud Composer deployment (k8s and Postgres
and celery executor). Can provide full pip install with Dockerfile if needed.
### What happened
The result of `GCSToGCSOperator` differs based of the existing source files
in the source bucket. And the result of `GCSToGCSOperator` also differs if we
run the equavalent `gsutil mv` command. I believe this is because the
`GCSToGCSOperator` treats moving a single object different than moving multiple
objects.
### What you think should happen instead
The `GCSToGCSOperator` should match what the `gsutil mv` command does.
### How to reproduce
## Overview
### Airflow operator usage
Here is our example usage of this operator.
```python
GCSToGCSOperator(
task_id="move-files",
source_bucket="bucket-name",
source_object="folder/nested_folder/",
destination_bucket="bucket-name-2",
destination_object="folder/nested_folder/",
move_object=True,
)
```
### gsutil mv usage
Here is our example usage of the gsutil mv command.
```bash
gsutil -m mv gs://bucket-name/folder/nested_folder
gs://bucket-name-2/folder/nested_folder
```
## Test 1: Expected result
Given that these files exist before running the task.
```bash
> gsutil -m ls "gs://bucket-name/folder/nested_folder/**"
gs://bucket-name/folder/nested_folder/aaaa/bbbb/cccc/12345.txt
gs://bucket-name/folder/nested_folder/aaaa/bbbb/cccc/67890.txt
```
The Airflow `GCSToGCSOperator` task will move
- `gs://bucket-name/folder/nested_folder/aaaa/bbbb/cccc/12345.txt` to
`gs://bucket-name-2/folder/nested_folder/aaaa/bbbb/cccc/12345.txt`
- `gs://bucket-name/folder/nested_folder/aaaa/bbbb/cccc/67890.txt` to
`gs://bucket-name-2/folder/nested_folder/aaaa/bbbb/cccc/67890.txt`
This matches what the equivalent gsutil command would do.
## Test 2: Unexpected result
Given that these files exist before running the task.
```bash
> gsutil -m ls "gs://bucket-name/folder/nested_folder/**"
gs://bucket-name/folder/nested_folder/aaaa/bbbb/cccc/12345.txt
```
The Airflow `GCSToGCSOperator` task will move
- `gs://bucket-name/folder/nested_folder/aaaa/bbbb/cccc/12345.txt` to
`gs://bucket-name-2/folder/nested_folder/12345.txt` with doesn't retain the
nested folder structure like the first test.
This does not match what the equivalent gsutil command would do. The gsutil
mv command would correctly move
- `gs://bucket-name/folder/nested_folder/aaaa/bbbb/cccc/12345.txt` to
`gs://bucket-name-2/folder/nested_folder/aaaa/bbbb/cccc/12345.txt`.
### Anything else
Here is the gcloud version output from my tests above.
```bash
> gcloud version
Google Cloud SDK 453.0.0
alpha 2023.10.27
beta 2023.10.27
bq 2.0.98
bundled-python3-unix 3.9.17
core 2023.10.27
gcloud-crc32c 1.0.0
gke-gcloud-auth-plugin 0.5.6
gsutil 5.27
```
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]