HaJunYoo opened a new issue, #56134:
URL: https://github.com/apache/airflow/issues/56134
### Description
Add a `flatten_structure` parameter to GCSToS3Operator that removes
directory structure from transferred files, uploading only the filename to the
S3 destination path.
### Use case/motivation
**Current Behavior:**
The `GCSToS3Operator` always preserves the full GCS object path (including
the prefix) when uploading to S3, regardless of the `keep_directory_structure`
setting.
For example:
```python
GCSToS3Operator(
gcs_bucket="my-bucket",
prefix="data/2025/01/15/file.parquet",
dest_s3_key="s3://target-bucket/processed/"
)
# Results in: s3://target-bucket/processed/data/2025/01/15/file.parquet
```
This makes it impossible to reorganize file structure during transfer
without creating intermediate buckets or complex workarounds.
**Desired Behavior:**
With `flatten_structure=True`, only the filename would be uploaded:
```python
GCSToS3Operator(
gcs_bucket="my-bucket",
prefix="data/2025/01/15/file.parquet",
dest_s3_key="s3://target-bucket/processed/2025/01/15/",
flatten_structure=True
)
# Results in: s3://target-bucket/processed/2025/01/15/file.parquet
```
**Implementation:**
```python
def _transform_file_path(self, file_path: str) -> str:
if self.flatten_structure:
return os.path.basename(file_path)
return file_path
```
This feature enables:
- Flexible path reorganization during cross-cloud transfers
- Cleaner S3 directory structures without GCS-specific paths
- Simplified data pipeline architectures without intermediate storage
### Related issues
_No response_
### Are you willing to submit a PR?
- [x] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]