HaJunYoo opened a new issue, #56134:
URL: https://github.com/apache/airflow/issues/56134

   ### Description
   
   Add a `flatten_structure` parameter to GCSToS3Operator that removes 
directory structure from transferred files, uploading only the filename to the 
S3 destination path.
   
   ### Use case/motivation
   
   **Current Behavior:**
   The `GCSToS3Operator` always preserves the full GCS object path (including 
the prefix) when uploading to S3, regardless of the `keep_directory_structure` 
setting. 
   
   For example:
   ```python
   GCSToS3Operator(
       gcs_bucket="my-bucket",
       prefix="data/2025/01/15/file.parquet",
       dest_s3_key="s3://target-bucket/processed/"
   )
   # Results in: s3://target-bucket/processed/data/2025/01/15/file.parquet
   ```
   
   This makes it impossible to reorganize file structure during transfer 
without creating intermediate buckets or complex workarounds.
   
   **Desired Behavior:**
   With `flatten_structure=True`, only the filename would be uploaded:
   ```python
   GCSToS3Operator(
       gcs_bucket="my-bucket", 
       prefix="data/2025/01/15/file.parquet",
       dest_s3_key="s3://target-bucket/processed/2025/01/15/",
       flatten_structure=True
   )
   # Results in: s3://target-bucket/processed/2025/01/15/file.parquet
   ```
   
   **Implementation:**
   ```python
   def _transform_file_path(self, file_path: str) -> str:
       if self.flatten_structure:
           return os.path.basename(file_path)
       return file_path
   ```
   
   This feature enables:
   - Flexible path reorganization during cross-cloud transfers
   - Cleaner S3 directory structures without GCS-specific paths
   - Simplified data pipeline architectures without intermediate storage
   
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to