jscheffl commented on code in PR #60148: URL: https://github.com/apache/airflow/pull/60148#discussion_r2669969182
########## ossfuzz/README.md: ########## @@ -0,0 +1,70 @@ +# Airflow OSS-Fuzz fuzzers + +This directory contains the upstream-owned fuzz targets used by OSS-Fuzz for +Apache Airflow. + +## Security Model Alignment + +These fuzzers target code paths with **clear security boundaries** per +Airflow's [security model](../airflow-core/docs/security/security_model.rst): + +- **DAG Serialization/Deserialization**: Used by Scheduler and API Server with + schema validation. Input comes from DAG parsing and caching. +- **Connection URI Parsing**: Used when creating/updating connections via API. + +We explicitly **avoid** fuzzing code paths in the "DAG author trust zone" +where Airflow's policy is that DAG authors can execute arbitrary code. + +## What's here + +- `*_fuzz.py`: Atheris fuzz targets (packaged by OSS-Fuzz via `pyinstaller`). +- `*.dict`: Optional libFuzzer dictionaries for structured inputs. +- `*.options`: libFuzzer options (e.g. `max_len`) tuned per target. +- `seed_corpus/<fuzzer>/...`: Small seed corpora that get zipped and uploaded to + OSS-Fuzz for each target. + +## Fuzzers + +| Fuzzer | Target | Security Boundary | +|--------|--------|-------------------| +| `serialized_dag_fuzz.py` | `DagSerialization.from_dict()` | Schema validation | +| `connection_uri_fuzz.py` | `Connection._parse_from_uri()` | API input validation | + +## Supported engines / sanitizers (Python constraints) + +Airflow is fuzzed as a **Python** OSS-Fuzz project. Practically, this means: + +- **Fuzzing engine**: `libfuzzer` (Atheris). Other engines (AFL/honggfuzz) are + not typically used/supported for Python targets in OSS-Fuzz. +- **Sanitizers**: `address`, `undefined`, `coverage`, `introspector` are the + relevant modes. **MSan (`memory`) is not supported** for Python OSS-Fuzz + projects. + +## Running locally with OSS-Fuzz helper + +From a checkout of `google/oss-fuzz`: + +```bash +# Build + basic validation: +python3 infra/helper.py build_fuzzers --clean --sanitizer address airflow /path/to/airflow +python3 infra/helper.py check_build --sanitizer address airflow + +# Coverage build + validation: +python3 infra/helper.py build_fuzzers --clean --sanitizer coverage airflow /path/to/airflow +python3 infra/helper.py check_build --sanitizer coverage airflow +``` + +## Running locally without OSS-Fuzz Review Comment: @potiuk I'd assume that 3.11++ might be OK, we could also take the Python 3.11 prod images and install clang on-top for the image used for fuzzing? Or do we need to integrate with every "normal" image? If our CI runs Py 3.10, still we check against all versions, Fuzzing, technically could also work with higher versions. I'd assume fuzzing results are version independent. Reproducibility is not important in my view because the Fuzzer will output the faulty input which can be used to reproduce any detected fault. No need to reproduze the fuzzing run (bit identical) on a local machine in my view. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
