potiuk commented on issue #24:
URL: https://github.com/apache/tooling-agents/issues/24#issuecomment-4399547701
Thanks @andrewmusselman — went through the L3 task-sdk report at `95bbf6a`
(122 findings). Triage below, same structure as the airflow-core L3 thread.
The auditor noted AGENTS.md context wasn't loaded for this run, which shows
in the volume — many of the by-design categories I flagged in the L1 triage
above re-surfaced under new IDs.
## Theme overlap with prior L1 triage (no further action)
| Theme | Old L1 ID | New L3 ID | Status |
|---|---|---|---|
| `_match_regexp` p.match → fullmatch | old F-005 | not in new report | ✅
merged in apache/airflow#66499 |
| HTTP scheme on `base_url` (admin-configured) | old F-001 | F-003 | by
design |
| `_NullFernet` silent dev fallback | old F-004 | F-018, F-019 | documented;
warning already emitted |
| IPC frame size — same trust domain | old F-006 | F-008 (DoS), F-010
(string fields), F-027 (rate limit), F-028 (FD exhaustion) | by design (worker
is the trust domain) |
| Worker / DAG code execution boundary | old F-011 | F-024, F-025 | by
design (DAG authors execute arbitrary Python by definition) |
| Terminal state guards / 401-403 backoff | old F-008/009/014/015 | F-006,
F-007 (specific scenarios — see below) | partially extends prior triage |
| Multi-team task-level isolation | old F-012/016 | F-014, F-016, F-040,
F-053 | known limitation per security model |
| URL encoding in Execution API path construction | old F-007 | F-039, F-046
| hygiene only — admin-configured `allowed_run_id_pattern` |
## New actionable findings — investigating / fixing
| # | Finding | Verdict | Plan |
|---|---|---|---|
| **F-005** | `httpx.ConnectError`/`TimeoutException` not caught by `except
ServerResponseError` in `supervisor.py:754` — generator crashes, IPC channel
permanently broken | **Real reliability bug.** Network exceptions during a
transient API outage shouldn't kill the worker's IPC channel. | Add catch-all
in `handle_requests` returning `ErrorResponse` |
| **F-006** | Failed task marked SUCCESS when terminal-state IPC `send`
fails in `task_runner.py` `finally` block — process exits 0, supervisor's
`final_state` returns `SUCCESS` | **Real correctness bug** (downstream pipeline
runs on bad upstream). | `sys.exit(1)` on send failure |
| **F-007** | `_terminal_state` set BEFORE `client.task_instances.succeed()`
succeeds; on transient API failure task gets stuck RUNNING on server forever |
**Real reliability bug** affecting all `STATES_SENT_DIRECTLY`. | Set
`_terminal_state` only after API call succeeds, or implement retry-on-failure |
| **F-004** | Remote-log-handler load failures silently return; operators
have no signal that logs aren't reaching the remote system | **Logging
blind-spot fix.** Small structured-warning addition. | Add
`log.warning("remote_log_handler_unavailable", …)` |
| **F-017** | Both `get_connection()` and `get_variable()` treat all
`ErrorResponse` types identically; an authz-denied response falls through to
`EnvironmentVariablesBackend` which doesn't check authz, defeating the explicit
deny | **Real authz fallthrough bug.** Distinguish "not found" from "denied" in
the response and raise on denied to prevent backend fallback. | Differentiate
error types + raise on authz failure |
I'll open separate PRs for each and link them back to this comment.
## Acknowledged but not actionable
| # | Finding | Why |
|---|---|---|
| F-001 | `NativeEnvironment` bypasses Jinja sandbox when
`render_template_as_native_obj=True` | DAG authors execute arbitrary Python by
design — they don't need a sandbox bypass to run dangerous code. The "false
confidence" claim about `is_safe_attribute()` becoming dead code in the native
branch is fair, but the sandbox itself is a defense against rogue *templates*,
not rogue *DAG authors*. The audit's recommended `SandboxedNativeEnvironment`
mixin would change rendering semantics in user-visible ways and is worth an
AIP-style design discussion, not a quiet code change. |
| F-002 | No crypto agility — Fernet (AES-128-CBC + HMAC-SHA256) hardcoded |
AES-128 is not deprecated — it remains a NIST-approved algorithm. Fernet
provides authenticated encryption + a versioning byte. Adding a configurable
encryption-backend abstraction is a substantial migration with data-format
compatibility implications across providers and existing encrypted secrets;
would be tracked as an AIP, not a defect fix. |
| F-008, F-027, F-028 | Unbounded IPC sizes / no rate limit between
supervisor and task subprocess | Both endpoints live in the same worker
process; a misbehaving task already has full code execution. The IPC channel is
not a privilege boundary in our model. (Same rationale as L1 F-006.) |
| F-011, F-012, F-013 | Private-attribute access via `is_safe_attribute`,
`jinja_environment_kwargs` overrides, `from_string()` no validation | Jinja
sandbox hardening for DAG author code — same threat model as F-001. By design. |
| F-026, F-029 | Unbounded polling / no quota on XCom/Variable writes |
Tasks can already do anything that costs resources in the cluster. |
## By design — per Airflow's security model
The remaining ~95 findings cluster:
| Category | Findings | Why |
|---|---|---|
| Documentation deliverables required by ASVS L3 | F-020, F-021, F-023,
F-030, F-031, F-032, F-033, F-034, F-037, F-041, F-052, F-054, F-058, F-064,
F-066, F-068, F-074, F-077, F-082, F-085, F-093, F-095 | Inventory /
classification / rotation-schedule docs that L3 asks for. Worth filing
individually as doc-improvement issues if any are blocking your audit; not
security defects. |
| Multi-team task-level isolation | F-014, F-016, F-040, F-053 | Documented
known limitation in the security model. |
| Logging architecture (encoding, "who" metadata, separation of audit vs
event logs) | F-015, F-035, F-036, F-038, F-070, F-071, F-073 | Substantive
improvements but deployment-shape choices, not code-level defects. |
| Rate limiting / DoS within worker trust domain | F-008, F-026, F-027,
F-028, F-029 | Trust domain — same as L1 F-006 rationale. |
| Crypto-policy / key-management documentation | F-018, F-019, F-020, F-021,
F-042, F-043, F-079, F-094 | Same architectural rationale as F-002. |
| Reverse-proxy / TLS / cipher / mTLS hardening | F-021, F-044, F-045,
F-098, F-099, F-100, F-101, F-103 | Deployment-manager responsibility. |
| Supply-chain / SBOM / dependency provenance | F-022, F-023, F-024, F-025,
F-067 | Pip/SBOM tooling layer, not the runtime SDK. |
| Premise factually wrong / requires Python ≥ 3.10 default | various
TLS-version findings | Already covered in L1 triage. |
## Patterns worth feeding back to the scanner (same notes as L1)
1. **Trust boundaries** — many findings re-assume task-subprocess ↔
supervisor is a privilege boundary. It isn't, in our model.
2. **DAG authors execute arbitrary Python** — Jinja sandbox findings keep
treating DAG authors as untrusted; they aren't.
3. **Multi-team task-level isolation** — explicitly documented as "not
enforced; tracked for future versions; should not be reported as new findings".
4. **Distinguish "AIP-grade architecture change" from "code-level defect"**
— F-001 (Jinja native) and F-002 (crypto agility) are the former, not the
latter.
Once AGENTS.md is in scanner context for the next run a lot of these should
drop out. Happy to dig deeper on any specific one if my reasoning above looks
off.
---
This response was generated by AI and may contain mistakes. After you've had
a chance to look at the PRs and respond, a real Airflow maintainer (human) will
follow up.
---
Drafted-by: Claude Opus 4.7 (1M context); reviewed by @potiuk before posting
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]