Re: [I] ASVS scan of airflow/task-sdk (tooling-agents)

via GitHub Wed, 06 May 2026 17:08:36 -0700


potiuk commented on issue #24:
URL: https://github.com/apache/tooling-agents/issues/24#issuecomment-4393098028


   Thanks for running the scan — went through all 16 findings against the 
actual `task-sdk` code at `7ca4c75`. Triaging them through Airflow's 
[documented security 
model](https://github.com/apache/airflow/blob/main/airflow-core/docs/security/security_model.rst):
   
   **TL;DR:** 1 of 16 worth acting on (FINDING-005). The remaining 15 are 
documented design choices, deployment-hardening recommendations outside SDK 
scope, hygiene-only improvements, or false claims. Detail below — happy to 
elaborate on any individual one.
   
   ### Worth fixing
   
   - **FINDING-005** — `_match_regexp` in 
`task-sdk/src/airflow/sdk/serde/__init__.py:336` does use `p.match()` rather 
than `p.fullmatch()`. Patterns come from `[core] 
allowed_deserialization_classes_regexp` (admin-configured, default empty), and 
the admin is trusted in our model — but `match` semantics are a well-known 
Python regex footgun and `fullmatch` is the safer default. Will put up a small 
defense-in-depth PR.
   
   ### Not actionable — by category
   
   **Documented deployment-manager responsibilities (not SDK code-level 
issues):**
   - 001 (HTTP scheme) — `base_url` is admin-configured. Plaintext HTTP behind 
TLS-terminating ingress / service mesh is a normal, supported deployment.
   - 002 / 003 (provider freshness / provenance) — supply-chain concerns belong 
to `pip` + signing + SBOM tooling, not the runtime SDK. A runtime allowlist 
would just shift the trust problem.
   - 004 (`_NullFernet` fallback) — documented dev affordance; production 
deployments are documented as requiring `FERNET_KEY`. The startup `log.warning` 
already alerts.
   
   **Documented limitations explicitly out-of-scope (per security model):**
   - 016 (auth-denied vs not-found in secrets backend) — multi-team task-level 
isolation is listed as a known limitation in the project's contributing docs: 
*"multi-team not enforcing task-level isolation. These are tracked for 
improvement in future versions and should not be reported as new findings."*
   - 012 (`init_log_file` path traversal) — `local_relative_path` is composed 
from already-validated identifiers; cross-team log isolation falls under the 
same documented limitation.
   
   **No real privilege boundary being crossed:**
   - 006 (IPC frame size) — channel is supervisor↔task subprocess, both inside 
the same worker trust domain. A misbehaving task already has full code 
execution.
   - 008 / 009 (terminal state guards / overwrite) — server is the source of 
truth and enforces state transitions; supervisor `_terminal_state` is internal 
bookkeeping.
   - 010 (cache cleanup) — cache lives in a Manager process tied to the 
supervisor; both die when the per-TI task ends.
   - 011 (`dag_rel_path` traversal) — workers are *supposed to* execute 
arbitrary DAG code; the bundle directory was never claimed as a confinement 
boundary.
   - 014 / 015 (401/403 not immediate kill) — during the backoff window the 
worker's *every* API call also fails 401/403, so it can't read secrets, write 
XCom, or do anything stateful. Graceful handling of transient auth blips is 
intentional.
   
   **Premise incorrect:**
   - 013 (TLS minimum version) — Airflow requires Python ≥ 3.10 (see 
`task-sdk/pyproject.toml:27`); `ssl.create_default_context()` on 3.10+ already 
sets `minimum_version = TLSv1_2`. The "TLS 1.0/1.1 negotiable" premise doesn't 
hold for supported Python.
   
   **Hygiene observation, weak security claim:**
   - 007 (URL encoding) — true that f-string interpolation skips `quote()`. But 
`dag_id`/`task_id`/`run_id` are validated by `[scheduler] 
allowed_run_id_pattern` (default `^[A-Za-z0-9_.~:+-]+$` — no URL-special 
chars), and JWT scopes by TI ID, not URL path. Defense-in-depth at most.
   
   ### Patterns that drove false positives — feedback for the scanner
   
   A few systematic things that might help future runs:
   
   1. **Distinguish trust boundaries.** Findings repeatedly assumed boundaries 
that aren't claimed (worker ↔ DAG code, supervisor ↔ task subprocess, 
base_log_folder confinement). Worth ingesting 
`airflow-core/docs/security/security_model.rst` as model context.
   2. **Distinguish admin config from user input.** Several findings treated 
admin-set values (`base_url`, `FERNET_KEY`, regex allowlist) as untrusted. The 
deployment manager is trusted in our model.
   3. **Check Python-version-conditional defaults** before claiming a default 
is unsafe (TLS finding).
   4. **The "known limitations" section of the security model** explicitly 
lists categories that should not be reported as new findings — multi-team 
isolation, DFP/triggerer DB access, etc.
   
   Closing this issue once 005 has a PR up. Let me know if any of the above 
triage looks wrong and I'll re-examine.
   
   ---
   Drafted-by: Claude Opus 4.7 (1M context); reviewed by @potiuk before posting


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] ASVS scan of airflow/task-sdk (tooling-agents)

Reply via email to