nyoungstudios opened a new issue, #37576:
URL: https://github.com/apache/airflow/issues/37576

   ### Apache Airflow Provider(s)
   
   google
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-google==10.10.0
   
   ### Apache Airflow version
   
   2.6.3
   
   ### Operating System
   
   Debian 11
   
   ### Deployment
   
   Google Cloud Composer
   
   ### Deployment details
   
   Reproducible locally in our Dockerfile based with Python environment 
installed with conda in our VS Code dev container. Local executor, Postgres 
database. Same error in our Google Cloud Composer deployment (k8s and Postgres 
and celery executor). Can provide full pip install with Dockerfile if needed.
   
   ### What happened
   
   The result of `GCSToGCSOperator` differs based of the existing source files 
in the source bucket. And the result of `GCSToGCSOperator` also differs if we 
run the equavalent `gsutil mv` command. I believe this is because the 
`GCSToGCSOperator` treats moving a single object different than moving multiple 
objects.
   
   ### What you think should happen instead
   
   The `GCSToGCSOperator` should match what the `gsutil mv` command does.
   
   ### How to reproduce
   
   ## Overview
   
   ### Airflow operator usage
   
   Here is our example usage of this operator.
   
   ```python
   GCSToGCSOperator(
       task_id="move-files",
       source_bucket="bucket-name",
       source_object="folder/nested_folder/",
       destination_bucket="bucket-name-2",
       destination_object="folder/nested_folder/",
       move_object=True,
   )
   ```
   
   ### gsutil mv usage
   
   Here is our example usage of the gsutil mv command.
   
   ```bash
   gsutil -m mv gs://bucket-name/folder/nested_folder 
gs://bucket-name-2/folder/nested_folder
   ```
   
   
   ## Test 1: Expected result
   
   Given that these files exist before running the task.
   ```bash
   > gsutil -m ls "gs://bucket-name/folder/nested_folder/**"
   gs://bucket-name/folder/nested_folder/aaaa/bbbb/cccc/12345.txt
   gs://bucket-name/folder/nested_folder/aaaa/bbbb/cccc/67890.txt
   ```
   
   The Airflow `GCSToGCSOperator` task will move
   - `gs://bucket-name/folder/nested_folder/aaaa/bbbb/cccc/12345.txt` to 
`gs://bucket-name-2/folder/nested_folder/aaaa/bbbb/cccc/12345.txt`
   - `gs://bucket-name/folder/nested_folder/aaaa/bbbb/cccc/67890.txt` to 
`gs://bucket-name-2/folder/nested_folder/aaaa/bbbb/cccc/67890.txt`
   
   This matches what the equivalent gsutil command would do.
   
   
   ## Test 2: Unexpected result
   
   Given that these files exist before running the task.
   ```bash
   > gsutil -m ls "gs://bucket-name/folder/nested_folder/**"
   gs://bucket-name/folder/nested_folder/aaaa/bbbb/cccc/12345.txt
   ```
   
   The Airflow `GCSToGCSOperator` task will move
   -  `gs://bucket-name/folder/nested_folder/aaaa/bbbb/cccc/12345.txt` to 
`gs://bucket-name-2/folder/nested_folder/12345.txt` with doesn't retain the 
nested folder structure like the first test.
   
   This does not match what the equivalent gsutil command would do. The gsutil 
mv command would correctly move 
   - `gs://bucket-name/folder/nested_folder/aaaa/bbbb/cccc/12345.txt` to 
`gs://bucket-name-2/folder/nested_folder/aaaa/bbbb/cccc/12345.txt`.
   
   ### Anything else
   
   Here is the gcloud version output from my tests above.
   
   ```bash
   > gcloud version
   Google Cloud SDK 453.0.0
   alpha 2023.10.27
   beta 2023.10.27
   bq 2.0.98
   bundled-python3-unix 3.9.17
   core 2023.10.27
   gcloud-crc32c 1.0.0
   gke-gcloud-auth-plugin 0.5.6
   gsutil 5.27
   ```
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to