bujji8411 opened a new issue, #60853: URL: https://github.com/apache/airflow/issues/60853
### Description Context The GcsToGcsOperator in the Google provider supports object copy/move semantics but does not currently allow setting object-level retention metadata (retention period or expiration timestamp) on destination objects. In regulated or compliance-driven workflows, retention must be enforced immediately after object creation in the destination bucket. Today, this requires an additional operator or custom Python task using the GCS client. Problem with current approach Using two steps (copy → set retention) introduces: Partial-success failure modes (object copied successfully, retention update fails) Additional retries and operational overhead More complex DAGs for what is logically a single concern While the GCS copy operation itself is atomic, the workflow-level behavior is not. Proposed enhancement Add an optional, backward-compatible parameter to GcsToGcsOperator to set object-level retention metadata on destination objects immediately after copy/move. Scope would be intentionally limited to: Object-level retention only No bucket-level retention changes No lifecycle rule management Default behavior remains unchanged. ### Use case/motivation In regulated and compliance-driven data pipelines (e.g. audit logs, financial records, contractual artifacts), object-level retention must be enforced immediately upon creation in the destination bucket. A common pattern is copying or moving objects between GCS buckets with different access boundaries (raw → curated, staging → compliance, regional → central). In these cases: Bucket-level retention cannot always be used (shared buckets, mixed data classes) Lifecycle rules may not be granular or immediate enough Retention must be applied per object, deterministically Current limitation Today, enforcing object-level retention requires a second step after GcsToGcsOperator completes (e.g. another operator or custom Python using the GCS client). This leads to: Partial-success scenarios (object copied, retention update fails or is retried) Additional retry and idempotency handling in DAGs More complex workflows for a logically single operation While the underlying GCS copy/move is atomic, the DAG-level workflow is not, which is problematic in compliance-sensitive pipelines. Why in the same operator An optional, object-level retention parameter in GcsToGcsOperator would: Reduce DAG complexity and failure modes Make retention enforcement explicit and colocated with data movement Remain fully backward compatible (default behavior unchanged) Explicit non-goals No bucket-level retention changes No lifecycle rule management No overlap with Storage Transfer Service operators This proposal is intentionally scoped to a small, optional enhancement that improves correctness and operability for compliance-driven users without changing existing behavior. ### Related issues _No response_ ### Are you willing to submit a PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
