Thanks for the thoughtful questions, Amogh. These are exactly the right
things to consider before committing resources. Let me address each one:

> 1. Where do these tests run? How long would it take to run? Any
> special needs? Cadence?

The proposal is to integrate with **OSS-Fuzz**, Google's continuous
fuzzing infrastructure for open source projects.

This means:

- Tests run on Google's infrastructure at no cost to the project
- Fuzzing runs continuously 24/7, not blocking CI
- No special hardware or infrastructure needs from our side

Optionally, fuzzers can run locally or in existing CI as quick sanity
checks (seconds to minutes), while deep fuzzing happens
asynchronously on OSS-Fuzz.

> 2. I see an initial maintenance burden too - who will own it /
> maintain it? Who will triage the reports? (false positives,
> duplicates, low priority bugs)

Once integrated, OSS-Fuzz operates autonomously. We have full control
over how findings are handled:

- Bugs are reported to the **OSS-Fuzz dashboard**, not directly to our
  issue tracker
- We can **enable or disable** automatic GitHub issue creation
- Findings are private for 90 days, then become public if unfixed

That 90-day window does create some pressure to address findings
- but the alternative is worse. These bugs exist whether or not we're
fuzzing. External researchers or attackers finding them first gives us
zero lead time. OSS-Fuzz guarantees we hear about it first, with 90
days to respond privately.

I'll handle the **initial integration work** - writing the fuzzers,
setting up the OSS-Fuzz project config, verifying it runs. After that,
maintenance is minimal; fuzzers rarely need updates unless the APIs
they target change significantly.

> 3. Airflow assumes trusted users, so some findings through the fuzzer
> might not be exploitable at all, but would lead to time spent triaging
> that.

Fair point. We can handle this carefully by scoping fuzzers to target
code paths where the security boundaries are simple - input parsing,
serialization, external protocol handling - and exclude areas where
Airflow's trusted user model means findings wouldn't be actionable.

> 4. DAG runs user code end of the day, fuzzer may find issues in user
> code instead? Can we control that?

Fuzzers work like regression tests - they target Airflow's own code
paths, not user DAGs. Just as our test suite imports and exercises
specific modules directly, fuzzers do the same:

- Input parsing and validation functions
- Serialization/deserialization (pickle, JSON, etc.)
- Command construction utilities
- Connection parameter handling

No DAG is ever loaded or executed. The fuzzer imports a function, feeds
it crafted inputs, and checks for crashes -- exactly like a unit test,
just with generated inputs instead of handwritten ones.

> 5. Our ecosystem of tons of providers may require us to spend
> significant initial time to cover that surface area and later
> maintain it

Agreed this is large. The proposal is not to fuzz all providers
immediately. Instead:

- **Phase 1:** Core Airflow only (serializers, API input handling,
  scheduler internals)
- **Phase 2:** High-risk providers with shell/exec patterns (SSH,
  Docker, Kubernetes, Teradata)
- **Phase 3:** Community-driven expansion as we see value

This mirrors how other large projects (Kubernetes, Envoy) adopted
fuzzing; start narrow, prove value, expand organically.

The bottom line: With OSS-Fuzz handling infrastructure, the upfront
cost is a small PR and minimal ongoing commitment. We get 90 days of
private lead time on any bugs found - far better than the zero days
we'd get if external researchers find them first. Happy to start with
a minimal proof-of-concept targeting just the serialization layer if
that helps demonstrate value.

Best,

Leslie

Reply via email to