potiuk commented on PR #67503:
URL: https://github.com/apache/airflow/pull/67503#issuecomment-4633488155

   To lay out where this stands and a way to make it stronger.
   
   **Current approach.** The PR replaces the old `with suppress(Exception)` 
around the supervisor IPC with a `try/except` that logs a dedicated 
`supervisor_mask_secret_failed` warning. The motivation is that this is a 
*security* path, and the failure is silent today: `add_mask` is deliberately 
built to mask non-`str`/`dict` iterables element-wise (its `elif 
isinstance(secret, Iterable)` branch — `set`/`tuple`/`frozenset`/generators), 
but `MaskSecret.value: JsonValue` rejects exactly those at construction. So 
`mask_secret(<set of secrets>)` masks locally yet never registers with the 
supervisor, and under `sending_to_supervisor=True` (where the task drops its 
own `mask_logs` processor) the secret surfaces unmasked in supervisor-side 
logs. A small test pins this down: same input fails on `main` (warning 
swallowed) and passes here (warning fires), with local masking and 
`comms.send`-not-reached both holding — i.e. a real, supervisor-alive silent 
leak.
   
   **On the "the types already prevent this" point.** You're right that 
`JsonValue` + mypy reject a literal `set`/`tuple`, and I agree a 
statically-`JsonValue` value can't fail this. But I'm not comfortable basing a 
security control on mypy at all: there's no guarantee or expectation that our 
users run mypy — it isn't part of the contract for Dag/provider authors, and 
until the mypy plugin we're currently preparing actually ships, running mypy 
against Airflow code is practically discouraged. On top of that, `mask_secret` 
is public API and the values that actually reach it are routinely `Any` 
(anything pulled out of a bare `dict`, a secrets backend, etc.), which mypy 
waves straight through even if the author does run it. For redaction, "the 
annotation says it can't happen" isn't the same as "it can't happen", and the 
cost of being wrong is a leaked secret — so I'd rather have a runtime signal 
than rely on the annotation.
   
   **Proposal to make it stronger.** Rather than just *reporting* the gap, we 
can mostly *close* it by coercing the non-JSON iterables to a `list` before 
building the message, so they actually propagate and get masked supervisor-side:
   
   ```python
   ipc_value = list(secret) if isinstance(secret, (set, frozenset, tuple)) else 
secret
   comms.send(MaskSecret(value=ipc_value, name=name))
   ```
   
   That makes the two paths agree for the common case (a 
`set`/`tuple`/`frozenset` of secrets) instead of papering over the difference. 
The warning then drops to a genuine last-resort guard for the residue coercion 
can't save — an exhausted generator (already consumed by the earlier 
`add_mask`) or a truly non-serializable object — which is a much more 
defensible role for it. I'd also document the `JsonValue` constraint on 
`MaskSecret` so the boundary isn't implicit.
   
   Happy to push the coercion + a test asserting the supervisor now receives 
the masked list, if you're on board — or keep it warning-only if you'd prefer 
the smaller change. Which do you want?
   
   ---
   Drafted-by: Claude Code (Opus 4.8); reviewed by @potiuk before posting
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to