potiuk opened a new pull request, #67509:
URL: https://github.com/apache/airflow/pull/67509

   `GCSHook.sync_to_local_dir` and `GCSTimeSpanFileTransformOperator._download` 
joined GCS blob names into local paths without verifying the resolved path 
stayed within the intended directory. GCS allows object names containing `..` 
segments, so a hostile blob name could cause files to be written outside 
`local_dir` / the operator's temp dir — a classic CWE-22 path-traversal sink.
   
   The trust model matters: a DAG author's own bucket is trusted, but these 
operators are routinely pointed at buckets shared with external partners or 
other tenants, where the write side may not be fully trusted.
   
   Reported as **F-005** + **F-006** in the [`apache/tooling-agents` L3 
providers/google sweep 
`b1aec75`](https://github.com/apache/tooling-agents/issues/34).
   
   ## Change
   
   At both sites, resolve the destination path and assert `is_relative_to` the 
target root before any download. On violation, raise `ValueError` with a clear 
message instead of silently writing outside the target.
   
   Sites touched:
   - [`hooks/gcs.py` 
`sync_to_local_dir`](https://github.com/apache/airflow/blob/main/providers/google/src/airflow/providers/google/cloud/hooks/gcs.py#L1370)
 — check before `_sync_to_local_dir_if_changed`.
   - [`operators/gcs.py` 
`GCSTimeSpanFileTransformOperator._download`](https://github.com/apache/airflow/blob/main/providers/google/src/airflow/providers/google/cloud/operators/gcs.py#L894)
 — check inside the per-blob download worker.
   
   ## Test plan
   
   - [x] `test_sync_to_local_dir_rejects_path_traversal` (hook) — a 
`../escape.py` blob raises `ValueError` and no file is created outside 
`local_dir`.
   - [x] `test_execute_rejects_path_traversal_in_blob_name` (operator) — a 
`../escape.py` blob raises `ValueError` and `download_to_filename` is never 
called.
   - [x] `prek run ruff` clean on touched files.
   - [x] Existing `test_sync_to_local_dir_behaviour` still passes (no behaviour 
change on safe blob names).
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [X] Yes — Claude Code (Opus 4.7)
   
   Generated-by: Claude Code (Opus 4.7) following [the 
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to