shahar1 commented on PR #67667:
URL: https://github.com/apache/airflow/pull/67667#issuecomment-4575875695

   > Two operators in the Google provider compute filesystem destination paths 
from GCS object names returned by `list*()` / `list_by_timespan()`, without 
normalizing the join result. Because GCS object names are arbitrary UTF-8 
strings, a name containing `..` segments (or an absolute-path prefix) escapes 
the configured destination when the path lands on the SFTP server or the local 
worker:
   > 
   > * `GCSToSFTPOperator._resolve_destination_path` 
(`providers/google/.../transfers/gcs_to_sftp.py`) — joins `source_object` to 
`self.destination_path` via `os.path.join`, which preserves `..` segments and 
lets absolute `source_object` values absorb the destination prefix entirely.
   > * `GCSTimeSpanFileTransformOperator._download` 
(`providers/google/.../operators/gcs.py`, line 899) — joins `blob_name` to 
`temp_input_dir_path` via `pathlib.Path.__truediv__`, with the same lack of 
normalization.
   > 
   > This PR adds a post-join validation to both call sites: after computing 
the destination path, normalize/resolve it and verify it remains within the 
configured destination (or the temp input directory). When it does not, raise 
`AirflowException` with a descriptive message identifying the offending 
object/blob name.
   > 
   > The allowlist in `scripts/ci/prek/known_airflow_exceptions.txt` is bumped 
for the two new `raise AirflowException` sites (one per operator).
   > 
   > ## Test plan
   > * [x]  New unit tests: `..`-segment object/blob name → `AirflowException`; 
absolute path → `AirflowException`; benign nested path → unchanged behavior.
   > * [x]  Existing tests pass (82 passed in `test_gcs_to_sftp.py` + 
`test_gcs.py`).
   > * [x]  `prek run` green on touched files.
   > 
   > ##### Was generative AI tooling used to co-author this PR?
   > * [x]  Yes — Claude Opus 4.7 (1M context)
   > 
   > Generated-by: Claude Opus 4.7 (1M context) following the guidelines at 
https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions
   
   Conflict :(


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to