This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch 3.2.0-docs in repository https://gitbox.apache.org/repos/asf/airflow.git
commit bd0274d0b6d000675f28bdf57270e04a6b22cd93 Author: Jarek Potiuk <[email protected]> AuthorDate: Mon Apr 6 16:19:09 2026 +0200 Fix spelling errors and use 'potentially' for DFP/Triggerer access - Add dumpable, sandboxing, unsanitized, XSS to spelling wordlist - Use 'potentially' consistently when describing Dag File Processor and Triggerer database access and JWT authentication bypass, since these are capabilities that Dag author code could exploit rather than guaranteed behaviors of normal operation --- .github/instructions/code-review.instructions.md | 2 +- AGENTS.md | 19 +++---- .../docs/installation/upgrading_to_airflow3.rst | 2 +- .../docs/security/jwt_token_authentication.rst | 41 +++++++-------- airflow-core/docs/security/security_model.rst | 60 +++++++++++----------- docs/spelling_wordlist.txt | 4 ++ 6 files changed, 68 insertions(+), 60 deletions(-) diff --git a/.github/instructions/code-review.instructions.md b/.github/instructions/code-review.instructions.md index 411f0814289..cd480bdcaf7 100644 --- a/.github/instructions/code-review.instructions.md +++ b/.github/instructions/code-review.instructions.md @@ -11,7 +11,7 @@ Use these rules when reviewing pull requests to the Apache Airflow repository. - **Scheduler must never run user code.** It only processes serialized Dags. Flag any scheduler-path code that deserializes or executes Dag/task code. - **Flag any task execution code that accesses the metadata DB directly** instead of through the Execution API (`/execution` endpoints). -- **Flag any code in Dag Processor or Triggerer that breaks process isolation** — these components run user code in separate processes from the Scheduler and API Server, but note that they have direct metadata database access and bypass JWT authentication via in-process Execution API transport. This is an intentional design choice documented in the security model, not a security vulnerability. +- **Flag any code in Dag Processor or Triggerer that breaks process isolation** — these components run user code in separate processes from the Scheduler and API Server, but note that they potentially have direct metadata database access and potentially bypass JWT authentication via in-process Execution API transport. This is an intentional design choice documented in the security model, not a security vulnerability. - **Flag any provider importing core internals** like `SUPERVISOR_COMMS` or task-runner plumbing. Providers interact through the public SDK and execution API only. ## Database and Query Correctness diff --git a/AGENTS.md b/AGENTS.md index 1925cce4a86..ac347fd2e91 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -66,11 +66,11 @@ UV workspace monorepo. Key paths: ## Architecture Boundaries 1. Users author Dags with the Task SDK (`airflow.sdk`). -2. Dag File Processor parses Dag files in separate processes and stores serialized Dags in the metadata DB. It has **direct database access** and uses an in-process Execution API transport that **bypasses JWT authentication**. +2. Dag File Processor parses Dag files in separate processes and stores serialized Dags in the metadata DB. It potentially has **direct database access** and uses an in-process Execution API transport that **potentially bypasses JWT authentication**. 3. Scheduler reads serialized Dags — **never runs user code** — and creates Dag runs / task instances. 4. Workers execute tasks via Task SDK and communicate with the API server through the Execution API — **never access the metadata DB directly**. Each task receives a short-lived JWT token scoped to its task instance ID. 5. API Server serves the React UI and handles all client-database interactions. -6. Triggerer evaluates deferred tasks/sensors in separate processes. Like the Dag File Processor, it has **direct database access** and uses an in-process Execution API transport that **bypasses JWT authentication**. +6. Triggerer evaluates deferred tasks/sensors in separate processes. Like the Dag File Processor, it potentially has **direct database access** and uses an in-process Execution API transport that **potentially bypasses JWT authentication**. 7. Shared libraries that are symbolically linked to different Python distributions are in `shared` folder. 8. Airflow uses `uv workspace` feature to keep all the distributions sharing dependencies and venv 9. Each of the distributions should declare other needed distributions: `uv --project <FOLDER> sync` command acts on the selected project in the monorepo with only dependencies that it has @@ -84,13 +84,14 @@ and [`airflow-core/docs/security/jwt_token_authentication.rst`](airflow-core/doc **The following are intentional design choices, not security vulnerabilities:** -- **Dag File Processor and Triggerer bypass JWT authentication.** They use `InProcessExecutionAPI` - which overrides the JWT bearer dependency to always allow access. This is by design — these - components run within trusted infrastructure and need direct database access for their core - operations (storing serialized Dags, managing trigger state). -- **Dag File Processor and Triggerer have direct metadata database access.** User-submitted code - (Dag files, trigger code) executes in these components and can potentially access the database. - This is a known limitation documented in the security model, not an undiscovered vulnerability. +- **Dag File Processor and Triggerer potentially bypass JWT authentication.** They use + `InProcessExecutionAPI` which overrides the JWT bearer dependency to always allow access. This + is by design — these components run within trusted infrastructure and potentially need direct + database access for their core operations (storing serialized Dags, managing trigger state). +- **Dag File Processor and Triggerer potentially have direct metadata database access.** + User-submitted code (Dag files, trigger code) executes in these components and can potentially + access the database. This is a known limitation documented in the security model, not an + undiscovered vulnerability. - **Worker Execution API tokens grant access to shared resources.** While `ti:self` scope prevents cross-task state manipulation, connections, variables, and XComs are accessible to all tasks. This is the current design — finer-grained scoping is planned for future versions. diff --git a/airflow-core/docs/installation/upgrading_to_airflow3.rst b/airflow-core/docs/installation/upgrading_to_airflow3.rst index 2f5cfea324c..ad0b5507b62 100644 --- a/airflow-core/docs/installation/upgrading_to_airflow3.rst +++ b/airflow-core/docs/installation/upgrading_to_airflow3.rst @@ -54,7 +54,7 @@ In Airflow 3, direct metadata database access from task code is now restricted. - **No Direct Database Access**: Task code can no longer directly import and use Airflow database sessions or models. - **API-Based Resource Access**: All runtime interactions (state transitions, heartbeats, XComs, and resource fetching) are handled through a dedicated Task Execution API. -- **Enhanced Security**: This improves isolation and security by preventing worker task code from directly accessing or modifying the Airflow metadata database. Note that Dag author code still executes with direct database access in the Dag File Processor and Triggerer — see :doc:`/security/security_model` for details. +- **Enhanced Security**: This improves isolation and security by preventing worker task code from directly accessing or modifying the Airflow metadata database. Note that Dag author code potentially still executes with direct database access in the Dag File Processor and Triggerer — see :doc:`/security/security_model` for details. - **Stable Interface**: The Task SDK provides a stable, forward-compatible interface for accessing Airflow resources without direct database dependencies. Step 1: Take care of prerequisites diff --git a/airflow-core/docs/security/jwt_token_authentication.rst b/airflow-core/docs/security/jwt_token_authentication.rst index bd897a681d6..87354039447 100644 --- a/airflow-core/docs/security/jwt_token_authentication.rst +++ b/airflow-core/docs/security/jwt_token_authentication.rst @@ -298,28 +298,29 @@ interact with the Execution API, but they do so via an **in-process** transport - Runs the Execution API application directly within the same process, using an ASGI/WSGI bridge. -- **Bypasses JWT authentication entirely** — the JWT bearer dependency is overridden to - always return a synthetic ``TIToken`` with the ``"execution"`` scope. -- Also bypasses per-resource access controls (connection, variable, and XCom access checks - are overridden to always allow). - -This design means that code running in the Dag File Processor or Triggerer has **unrestricted -access** to all Execution API operations without needing a valid JWT token. Since the Dag File -Processor parses user-submitted Dag files and the Triggerer executes user-submitted trigger -code, Dag authors whose code runs in these components effectively have the same level of -access as the internal API itself. +- **Potentially bypasses JWT authentication** — the JWT bearer dependency is overridden to + always return a synthetic ``TIToken`` with the ``"execution"`` scope, effectively bypassing + token validation. +- Also potentially bypasses per-resource access controls (connection, variable, and XCom access + checks are overridden to always allow). + +This design means that code running in the Dag File Processor or Triggerer potentially has +**unrestricted access** to all Execution API operations without needing a valid JWT token. Since +the Dag File Processor parses user-submitted Dag files and the Triggerer executes user-submitted +trigger code, Dag authors whose code runs in these components could potentially have the same +level of access as the internal API itself. In the default deployment, a **single Dag File Processor instance** parses Dag files for all teams and a **single Triggerer instance** handles all triggers across all teams. This means -that Dag author code from different teams executes within the same process, with shared access -to the in-process Execution API and the metadata database. +that Dag author code from different teams executes within the same process, with potentially +shared access to the in-process Execution API and the metadata database. For multi-team deployments that require isolation, Deployment Managers must run **separate Dag File Processor and Triggerer instances per team** as a deployment-level measure — Airflow does not provide built-in support for per-team DFP or Triggerer instances. However, even with -separate instances, these components still have direct access to the metadata database -(the Dag File Processor needs it to store serialized Dags, and the Triggerer needs it to -manage trigger state). A Dag author whose code runs in these components can potentially +separate instances, these components still potentially have direct access to the metadata +database (the Dag File Processor needs it to store serialized Dags, and the Triggerer needs it +to manage trigger state). A Dag author whose code runs in these components can potentially access the database directly, including reading or modifying data belonging to other teams, or obtaining the JWT signing key if it is available in the process environment. @@ -374,13 +375,13 @@ The current JWT authentication model operates under the following assumptions an separation between teams. Task-level team isolation will be improved in future versions of Airflow. -**Dag File Processor and Triggerer bypass** +**Dag File Processor and Triggerer potentially bypass JWT and access the database** As described above, the default deployment runs a single Dag File Processor and a single - Triggerer for all teams. Both bypass JWT authentication entirely via in-process transport. + Triggerer for all teams. Both potentially bypass JWT authentication via in-process transport. For multi-team isolation, Deployment Managers must run separate instances per team, but - even then, each instance retains direct database access. A Dag author whose code runs - in these components can potentially access the database directly — including data belonging - to other teams or the JWT signing key configuration — unless the Deployment Manager + even then, each instance potentially retains direct database access. A Dag author whose code + runs in these components can potentially access the database directly — including data + belonging to other teams or the JWT signing key configuration — unless the Deployment Manager restricts the database credentials and configuration available to each instance. **Planned improvements** diff --git a/airflow-core/docs/security/security_model.rst b/airflow-core/docs/security/security_model.rst index d030e879096..cb1ade8e4f8 100644 --- a/airflow-core/docs/security/security_model.rst +++ b/airflow-core/docs/security/security_model.rst @@ -69,7 +69,7 @@ the Dag File Processor, and the Triggerer, and potentially access the credential code uses to access external systems. In Airflow 3, worker task code communicates with the API server exclusively through the Execution API and does not have direct access to the metadata database. However, Dag author code that executes in the Dag File Processor -and Triggerer still has direct access to the metadata database, as these components +and Triggerer potentially still has direct access to the metadata database, as these components require it for their operation (see :ref:`jwt-authentication-and-workload-isolation` for details). Authenticated UI users @@ -204,7 +204,7 @@ Limiting Dag Author access to subset of Dags Airflow does not yet provide full task-level isolation between different groups of users when it comes to task execution. While, in Airflow 3.0 and later, worker task code cannot directly access the metadata database (it communicates through the Execution API), Dag author code that runs in the Dag File -Processor and Triggerer still has direct database access. Regardless of execution context, Dag authors +Processor and Triggerer potentially still has direct database access. Regardless of execution context, Dag authors have access to all Dags in the Airflow installation and they can modify any of those Dags - no matter which Dag the task code is executed for. This means that Dag authors can modify state of any task instance of any Dag, and there are no finer-grained access controls to limit that access. @@ -256,9 +256,10 @@ enforcement mechanisms that would allow to isolate tasks that are using deferrab each other and arbitrary code from various tasks can be executed in the same process/machine. The default deployment runs a single Triggerer instance that handles triggers from all teams — there is no built-in support for per-team Triggerer instances. Additionally, the Triggerer uses an in-process Execution API -transport that bypasses JWT authentication and has direct access to the metadata database. For -multi-team deployments, Deployment Managers must run separate Triggerer instances per team as a -deployment-level measure, but even then each instance retains direct database access and a Dag author +transport that potentially bypasses JWT authentication and potentially has direct access to the metadata +database. For multi-team deployments, Deployment Managers must run separate Triggerer instances per team +as a deployment-level measure, but even then each instance potentially retains direct database access +and a Dag author whose trigger code runs there can potentially access the database directly — including data belonging to other teams. Deployment Manager must trust that Dag authors will not abuse this capability. @@ -317,34 +318,35 @@ Current isolation limitations While Airflow 3 significantly improved the security model by preventing worker task code from directly accessing the metadata database (workers now communicate exclusively through the Execution API), **perfect isolation between Dag authors is not yet achieved**. Dag author code -still executes with direct database access in the Dag File Processor and Triggerer. The -following gaps exist: +potentially still executes with direct database access in the Dag File Processor and Triggerer. +The following gaps exist: -**Dag File Processor and Triggerer bypass JWT authentication** +**Dag File Processor and Triggerer potentially bypass JWT authentication** The Dag File Processor and Triggerer use an in-process transport to access the Execution API, - which bypasses JWT authentication entirely. Since these components execute user-submitted code - (Dag files and trigger code respectively), a Dag author whose code runs in these components has - unrestricted access to all Execution API operations — including the ability to read any connection, - variable, or XCom — without needing a valid JWT token. + which potentially bypasses JWT authentication. Since these components execute user-submitted code + (Dag files and trigger code respectively), a Dag author whose code runs in these components + potentially has unrestricted access to all Execution API operations — including the ability to + read any connection, variable, or XCom — without needing a valid JWT token. - Furthermore, the Dag File Processor has direct access to the metadata database (it needs this to - store serialized Dags). Dag author code executing in the Dag File Processor context could potentially - access the database directly, including the signing key configuration if it is available in the - process environment. If a Dag author obtains the JWT signing key, they could forge arbitrary tokens. + Furthermore, the Dag File Processor potentially has direct access to the metadata database (it + needs this to store serialized Dags). Dag author code executing in the Dag File Processor context + could potentially access the database directly, including the signing key configuration if it is + available in the process environment. If a Dag author obtains the JWT signing key, they could + potentially forge arbitrary tokens. **Dag File Processor and Triggerer are shared across teams** In the default deployment, a **single Dag File Processor instance** parses all Dag files and a **single Triggerer instance** handles all triggers — regardless of team assignment. There is no built-in support for running per-team Dag File Processor or Triggerer instances. This means that - Dag author code from different teams executes within the same process, sharing the in-process - Execution API and direct database access. + Dag author code from different teams executes within the same process, potentially sharing the + in-process Execution API and direct database access. For multi-team deployments that require separation, Deployment Managers must run **separate Dag File Processor and Triggerer instances per team** as a deployment-level measure (for example, by configuring each instance to only process bundles belonging to a specific team). However, even - with separate instances, each Dag File Processor and Triggerer retains direct access to the - metadata database — a Dag author whose code runs in these components can potentially access the - database directly, including reading or modifying data belonging to other teams, unless the + with separate instances, each Dag File Processor and Triggerer potentially retains direct access + to the metadata database — a Dag author whose code runs in these components can potentially access + the database directly, including reading or modifying data belonging to other teams, unless the Deployment Manager restricts the database credentials and configuration available to each instance. **No cross-workload isolation in the Execution API** @@ -550,7 +552,7 @@ Dag authors executing arbitrary code Dag authors can execute arbitrary code on workers, the Dag File Processor, and the Triggerer. This includes accessing credentials, environment variables, and (in the case of the Dag File Processor -and Triggerer) the metadata database directly. This is the intended behavior as described in +and Triggerer) potentially the metadata database directly. This is the intended behavior as described in :ref:`capabilities-of-dag-authors` — Dag authors are trusted users. Reports that a Dag author can "achieve RCE" or "access the database" by writing Dag code are restating a documented capability, not discovering a vulnerability. @@ -572,14 +574,14 @@ arbitrary code. See also :doc:`/security/sql`. An exception exists when official Airflow documentation explicitly recommends a pattern that leads to injection — in that case, the documentation guidance itself is the issue and may warrant an advisory. -Dag File Processor and Triggerer having database access -....................................................... +Dag File Processor and Triggerer potentially having database access +................................................................... -The Dag File Processor requires direct database access to store serialized Dags. The Triggerer requires -direct database access to manage trigger state. Both components execute user-submitted code (Dag files -and trigger code respectively) and bypass JWT authentication via an in-process Execution API transport. -These are intentional architectural choices, not vulnerabilities. They are documented in -:ref:`jwt-authentication-and-workload-isolation`. +The Dag File Processor potentially has direct database access to store serialized Dags. The Triggerer +potentially has direct database access to manage trigger state. Both components execute user-submitted +code (Dag files and trigger code respectively) and potentially bypass JWT authentication via an +in-process Execution API transport. These are intentional architectural choices, not vulnerabilities. +They are documented in :ref:`jwt-authentication-and-workload-isolation`. Workers accessing shared Execution API resources ................................................. diff --git a/docs/spelling_wordlist.txt b/docs/spelling_wordlist.txt index bd5539dc85a..2ba6bf200d5 100644 --- a/docs/spelling_wordlist.txt +++ b/docs/spelling_wordlist.txt @@ -510,6 +510,7 @@ dttm dtypes du duckdb +dumpable dunder dup durations @@ -1384,6 +1385,7 @@ salesforce samesite saml sandboxed +sandboxing sanitization sas Sasl @@ -1728,6 +1730,7 @@ unpause unpaused unpausing unpredicted +unsanitized untestable untransformed untrusted @@ -1832,6 +1835,7 @@ Xiaodong xlarge xml xpath +XSS xyz yaml Yandex
