potiuk opened a new pull request, #67667: URL: https://github.com/apache/airflow/pull/67667
Two operators in the Google provider compute filesystem destination paths from GCS object names returned by `list*()` / `list_by_timespan()`, without normalizing the join result. Because GCS object names are arbitrary UTF-8 strings, a name containing `..` segments (or an absolute-path prefix) escapes the configured destination when the path lands on the SFTP server or the local worker: - `GCSToSFTPOperator._resolve_destination_path` (`providers/google/.../transfers/gcs_to_sftp.py`) — joins `source_object` to `self.destination_path` via `os.path.join`, which preserves `..` segments and lets absolute `source_object` values absorb the destination prefix entirely. - `GCSTimeSpanFileTransformOperator._download` (`providers/google/.../operators/gcs.py`, line 899) — joins `blob_name` to `temp_input_dir_path` via `pathlib.Path.__truediv__`, with the same lack of normalization. This PR adds a post-join validation to both call sites: after computing the destination path, normalize/resolve it and verify it remains within the configured destination (or the temp input directory). When it does not, raise `AirflowException` with a descriptive message identifying the offending object/blob name. The allowlist in `scripts/ci/prek/known_airflow_exceptions.txt` is bumped for the two new `raise AirflowException` sites (one per operator). ## Test plan - [x] New unit tests: `..`-segment object/blob name → `AirflowException`; absolute path → `AirflowException`; benign nested path → unchanged behavior. - [x] Existing tests pass (82 passed in `test_gcs_to_sftp.py` + `test_gcs.py`). - [x] `prek run` green on touched files. ##### Was generative AI tooling used to co-author this PR? - [X] Yes — Claude Opus 4.7 (1M context) Generated-by: Claude Opus 4.7 (1M context) following the guidelines at https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
