Copilot commented on code in PR #64755:
URL: https://github.com/apache/airflow/pull/64755#discussion_r3066485543
##########
providers/google/tests/unit/google/cloud/transfers/test_gcs_to_gcs.py:
##########
@@ -1062,3 +1062,47 @@ def
test_execute_excludes_directory_uri_from_return(self, mock_hook):
)
result = operator.execute(None)
assert result == [f"gs://{DESTINATION_BUCKET}/backup/file.txt"]
+
+ @mock.patch("airflow.providers.google.cloud.transfers.gcs_to_gcs.GCSHook")
+ def test_copy_single_file_with_retention(self, mock_hook):
+ retain_until = datetime(2027, 1, 1)
+ mock_hook.return_value.list.return_value = [SOURCE_OBJECT_NO_WILDCARD]
+ operator = GCSToGCSOperator(
+ task_id=TASK_ID,
+ source_bucket=TEST_BUCKET,
+ source_object=SOURCE_OBJECT_NO_WILDCARD,
+ destination_bucket=DESTINATION_BUCKET,
+ destination_object=SOURCE_OBJECT_NO_WILDCARD,
+ retain_until_time=retain_until,
+ retention_mode="Locked",
+ )
+
+ operator.execute(None)
+ mock_hook.return_value.rewrite.assert_called_once_with(
+ TEST_BUCKET,
+ SOURCE_OBJECT_NO_WILDCARD,
+ DESTINATION_BUCKET,
+ SOURCE_OBJECT_NO_WILDCARD,
+ retain_until_time=retain_until,
+ retention_mode="Locked",
+ )
Review Comment:
Operator tests cover (a) retention with an explicit `retention_mode` and (b)
no retention, but don’t cover the new behavior where `retain_until_time` is set
and `retention_mode` is omitted (expecting the hook default to apply). Adding a
regression test for that case would ensure the operator passes through
`retain_until_time` without adding an unexpected `retention_mode` kwarg.
##########
providers/google/src/airflow/providers/google/cloud/hooks/gcs.py:
##########
@@ -248,18 +256,30 @@ def rewrite(
source_object = source_bucket.blob(blob_name=source_object) # type:
ignore[attr-defined]
destination_bucket = client.bucket(destination_bucket)
- token, bytes_rewritten, total_bytes = destination_bucket.blob( #
type: ignore[attr-defined]
+ destination_blob = destination_bucket.blob( # type:
ignore[attr-defined]
blob_name=destination_object
- ).rewrite(source=source_object)
+ )
+
+ token, bytes_rewritten, total_bytes =
destination_blob.rewrite(source=source_object)
self.log.info("Total Bytes: %s | Bytes Written: %s", total_bytes,
bytes_rewritten)
while token is not None:
- token, bytes_rewritten, total_bytes = destination_bucket.blob( #
type: ignore[attr-defined]
- blob_name=destination_object
- ).rewrite(source=source_object, token=token)
+ token, bytes_rewritten, total_bytes =
destination_blob.rewrite(source=source_object, token=token)
self.log.info("Total Bytes: %s | Bytes Written: %s", total_bytes,
bytes_rewritten)
+
+ if retain_until_time is not None:
+ destination_blob.retention.mode = retention_mode or "Unlocked"
+ destination_blob.retention.retain_until_time = retain_until_time
+ destination_blob.patch()
Review Comment:
`rewrite()` docs say `retention_mode` can be only "Locked" or "Unlocked",
but the implementation accepts any string and will send it to GCS. Consider
validating `retention_mode` (and also raising if `retention_mode` is provided
without `retain_until_time`) to fail fast with a clear error instead of making
a request that will likely fail (or silently do nothing in the second case).
##########
providers/google/src/airflow/providers/google/cloud/transfers/gcs_to_gcs.py:
##########
@@ -96,6 +96,12 @@ class GCSToGCSOperator(BaseOperator):
copied.
:param match_glob: (Optional) filters objects based on the glob pattern
given by the string (
e.g, ``'**/*/.json'``)
+ :param retain_until_time: (Optional) A datetime specifying until when the
destination
+ objects should be retained. Requires the destination bucket to have
object retention
+ enabled. The retention is applied after each object is copied/moved.
Review Comment:
The new `retain_until_time` parameter is a datetime similar to
`last_modified_time` / `maximum_modified_time`, but its docstring doesn’t
mention timezone handling. To avoid ambiguity for users, consider documenting
whether naive datetimes are treated as UTC (or require tz-aware values),
consistent with the other datetime parameters in this operator.
```suggestion
enabled. The retention is applied after each object is copied/moved.
If tzinfo has not been set, UTC will be assumed.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]