bujji8411 opened a new issue, #60853:
URL: https://github.com/apache/airflow/issues/60853

   ### Description
   
   Context
   The GcsToGcsOperator in the Google provider supports object copy/move 
semantics but does not currently allow setting object-level retention metadata 
(retention period or expiration timestamp) on destination objects.
   
   In regulated or compliance-driven workflows, retention must be enforced 
immediately after object creation in the destination bucket. Today, this 
requires an additional operator or custom Python task using the GCS client.
   
   Problem with current approach
   Using two steps (copy → set retention) introduces:
   
   Partial-success failure modes (object copied successfully, retention update 
fails)
   
   Additional retries and operational overhead
   
   More complex DAGs for what is logically a single concern
   
   While the GCS copy operation itself is atomic, the workflow-level behavior 
is not.
   
   Proposed enhancement
   Add an optional, backward-compatible parameter to GcsToGcsOperator to set 
object-level retention metadata on destination objects immediately after 
copy/move.
   
   Scope would be intentionally limited to:
   
   Object-level retention only
   
   No bucket-level retention changes
   
   No lifecycle rule management
   
   Default behavior remains unchanged.
   
   ### Use case/motivation
   
   In regulated and compliance-driven data pipelines (e.g. audit logs, 
financial records, contractual artifacts), object-level retention must be 
enforced immediately upon creation in the destination bucket.
   
   A common pattern is copying or moving objects between GCS buckets with 
different access boundaries (raw → curated, staging → compliance, regional → 
central). In these cases:
   
   Bucket-level retention cannot always be used (shared buckets, mixed data 
classes)
   
   Lifecycle rules may not be granular or immediate enough
   
   Retention must be applied per object, deterministically
   
   Current limitation
   Today, enforcing object-level retention requires a second step after 
GcsToGcsOperator completes (e.g. another operator or custom Python using the 
GCS client). This leads to:
   
   Partial-success scenarios (object copied, retention update fails or is 
retried)
   
   Additional retry and idempotency handling in DAGs
   
   More complex workflows for a logically single operation
   
   While the underlying GCS copy/move is atomic, the DAG-level workflow is not, 
which is problematic in compliance-sensitive pipelines.
   
   Why in the same operator
   An optional, object-level retention parameter in GcsToGcsOperator would:
   
   Reduce DAG complexity and failure modes
   
   Make retention enforcement explicit and colocated with data movement
   
   Remain fully backward compatible (default behavior unchanged)
   
   Explicit non-goals
   
   No bucket-level retention changes
   
   No lifecycle rule management
   
   No overlap with Storage Transfer Service operators
   
   This proposal is intentionally scoped to a small, optional enhancement that 
improves correctness and operability for compliance-driven users without 
changing existing behavior.
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to