Re: [DISCUSS] Execution API access control for multi-team (AIP-67)

Jarek Potiuk Fri, 20 Feb 2026 13:23:20 -0800

I think there is one more thing that I've been mentioning some time ago and
it's time to put in more concrete words.


Currently there is **no** protection against the tasks making claims that
they belong to other tasks. While the running task by default receives the
generated token to use from the supervisor - there is absolutely no problem
for the forked task to inspect parent process memory to get the
"supervisor" token that is used to sign the "task" token and generate a new
token with **any"" dag_id or task_id or basically any other claim.

This is by design currently, because we do not have any control implemented
and part of the security model of Airflow 3.0 - 3.1 is that any task can
perform **any** action on task SDK and we never even attempt to verify
which tasks, dags state it can modify, which connections or variables it
accesses. We only need to know that this "task" was authorised by the
scheduler to call "task-sdk" API.

With multi-team, this assumption is broken. We **need** to know and
**enforce** task_id provenance. The situation when one task pretends to be
another task is not acceptable any more - and violates basic isolation
between the teams.

As I understand the way how current supervisor-> task JWT token generation
works is (and please correct me if I am wrong):

* when supervisor starts it reads configuration of ("jwt_secret" /
"jwt_private_key_path" / "jwt_kid")
* when it starts a task, it uses this "secret" to generate a JWT_token for
that task (with "dag_id", "dag_run_id", "task_instance_id") claims - and it
is used by supervisor to communicate with api_server
* forked task does not have direct reference to that token nor to the
jwt_secret when started - it does not get it passed
* executing task process is only supposed to communicate with the
supervisor via in-process communication, it does not open connection nor
use the JWT_token directly

Now ... the interesting thing is that while the forked process does not
have an "easy" API to not only get the token and use it directly, but also
to generate NEW token because no matter how hard we try, the forked task
will **always** be able to access "jwt_secret" and create its own JWT_token
- and add **ANY** claims to that token. That's simply a consequence of
using our fork model, also additional thing is that (default) approach of
using the same unix user in the forked process, enables the forked process
to read **any** of the information that supervisor process accesses
(including configuration files, env variables and even memory of the
supervisor process).

There are two ways how running task can get JWT_SECRET:

* since the task process is forked from the supervisor - everything that
parent process has in memory - even if the method executed in the fork has
no direct reference to it. The forked process can use "globals" and get to
any variable, function, class, method that the parent supervisor process
has. It can read any data in memory of the process. So if the JWT_Secret is
already in memory of the parent process when the task process is forked, it
also is in memory of the task process

* since the task process is the same unix user as the parent process - it
has access to all the same configuration, environment data. Even if the
parent process will clear os.environ - the child process can read original
environment vairables the parent process has been started with using
`/proc` filesystem (it just needs to know the parent process id - which it
always has). Unless more sophisticated mechanism are used such SELinux
(requires kernel with SELinux and configured system-level SELinux rules) ,
user impersonation and cgroups/proper access control to files (requires
sudo access for parent process id)  - such forked process can do
**everything** the parent process can do - including reading the
configuration of JWT_secret and creating JWT_tokens with (again) any
task_instance_id, any dag id, and dag_run id claim.

So no matter what we do on the "server" side - the client side (supervisor)
- in the default configuration already allows the task to pretend they are
"whatever dag id" - in which case server side verification is pointless.

I believe (Ashb? Kaxil? Amogh?) that was a deliberate decision when the API
was designed when the Task SDK / JWT token for Airflow 3.0 was implemented
(because we did not need it).

I would love to hear if my thinking is wrong, but I highly doubt it, so I
wonder what were the original thoughts here on how the task identity can
have "cryptographically strong" provenance ?  I have some ideas for that,
but I would love to hear what the original author's thoughts are ?

J.

On Fri, Feb 20, 2026 at 8:49 PM Anish Giri <[email protected]> wrote:

> Thanks, Vincent! I appreciate your review. I'll get started on the
> implementation and tag you on the PRs.
>
> Anish
>
> On Fri, Feb 20, 2026 at 8:23 AM Vincent Beck <[email protected]> wrote:
> >
> > Hey Anish,
> >
> > Everything you said makes sense to me, I might have questions on
> specifics but I rather keep them for PRs, that'll make everything way
> easier.
> >
> > Feel free to ping me on all your PRs,
> > Vincent
> >
> > On 2026/02/20 07:34:47 Anish Giri wrote:
> > > Hello everyone,
> > >
> > > Jarek asked for a proposal on #60125 [1] before implementing access
> > > control for the Execution API's resource endpoints (variables,
> > > connections, XComs), so here it is.
> > >
> > > After going through the codebase, I think this is really about
> > > completing AIP-67's [2] multi-team boundary enforcement rather than
> > > introducing a new security model. Most of the infrastructure already
> > > exists. What's missing are the actual authorization checks.
> > >
> > > The current state:
> > >
> > > The Execution API has three authorization stubs that always return
> True:
> > >
> > > - has_variable_access() in execution_api/routes/variables.py
> > > - has_connection_access() in execution_api/routes/connections.py
> > > - has_xcom_access() in execution_api/routes/xcoms.py
> > >
> > > All three have a "# TODO: Placeholder for actual implementation"
> comment.
> > >
> > > For variables and connections, vincbeck's data-layer team scoping
> > > (#58905 [4], #59476 [5]) already prevents cross-team data retrieval in
> > > practice. A cross-team request returns a 404 rather than the resource.
> > > So the data isolation is there, but the auth stubs don't reject these
> > > requests early with a proper 403, and there's no second layer of
> > > protection at the auth check itself.
> > >
> > > For XComs, the situation is different. There is no isolation at any
> > > layer. XCom routes take dag_id, run_id, and task_id directly from URL
> > > path parameters with no validation against the calling task's
> > > identity. A task in Team-A's bundle can currently read and write
> > > Team-B's XComs.
> > >
> > > There's already a get_team_name_dep() function in deps.py that
> > > resolves a task's team via TaskInstance -> DagModel -> DagBundleModel
> > > -> Team in a single join query. The variable and connection endpoints
> > > already use it. XCom routes don't use it at all.
> > >
> > > Proposed approach:
> > >
> > > I'm thinking of this in two parts:
> > >
> > > 1) Team boundary checks for variables and connections
> > >
> > > Fill the auth stubs with team boundary checks. For reference, the Core
> > > API does this in security.py. requires_access_variable() resolves the
> > > resource's team via Variable.get_team_name(key), wraps it in
> > > VariableDetails, and passes it to
> > > auth_manager.is_authorized_variable(method, details, user). The auth
> > > manager then checks team membership.
> > >
> > > For the Execution API, the flow would be similar but without going
> > > through the auth manager (I'll explain why below):
> > >
> > > variable_key -> Variable.get_team_name(key) -> resource_team
> > > token.id -> get_team_name_dep() -> task_team
> > > Deny if resource_team != task_team (when both are non-None)
> > >
> > > When core.multi_team is disabled, get_team_name_dep returns None and
> > > the check is skipped, so current single-team behavior stays exactly
> > > the same.
> > >
> > > 2) XCom authorization
> > >
> > > This is the harder part. For writes, I think we should verify the
> > > calling task is writing its own XComs -- the task identity from the
> > > JWT should match the dag_id/task_id in the URL path. For reads,
> > > enforce team boundary so a task can only read XComs from tasks within
> > > the same team. This would allow cross-DAG xcom_pull within a team
> > > (which people already do) while preventing cross-team access.
> > >
> > > To avoid a DB lookup on every request, I'd propose adding dag_id to
> > > the JWT claims at generation time. The dag_id is already on the
> > > TaskInstance schema in ExecuteTask.make() (workloads.py:142). The
> > > JWTReissueMiddleware already preserves all claims during token
> > > refresh, so this wouldn't break anything. Adding task_id and run_id to
> > > the token could be done as a follow-up -- there's a TODO at
> > > xcoms.py:315 about eventually deriving these from the token instead of
> > > the URL.
> > >
> > > I'm not proposing to add team_name to the token. It's not available on
> > > the TaskInstance schema at generation time. Resolving it requires a DB
> > > join through DagModel -> DagBundleModel -> Team, which would slow down
> > > the scheduler's task queuing path. Better to resolve it at request
> > > time via get_team_name_dep.
> > >
> > > Why not go through BaseAuthManager?
> > >
> > > One design question I want to raise: the Execution API auth stubs
> > > currently don't call BaseAuthManager.is_authorized_*(), and I think
> > > they probably shouldn't. The BaseAuthManager interface is designed
> > > around human identity (BaseUser with roles and team memberships), but
> > > the Execution API operates on task identity (TIToken with a UUID).
> > > These are very different things. A task doesn't have a "role" in the
> > > RBAC sense, it has a team derived from its DAG's bundle.
> > >
> > > I'm leaning toward keeping the authorization logic directly in the
> > > has_*_access dependency functions, using get_team_name_dep for team
> > > resolution. This keeps the Execution API auth simple and avoids tying
> > > task authorization to the human auth manager. But I'd like to hear if
> > > others think we should instead extend BaseAuthManager with
> > > task-identity-aware methods.
> > >
> > > What about single-team deployments?
> > >
> > > When core.multi_team=False (the default for most deployments), the
> > > team boundary checks would be skipped entirely for variables and
> > > connections. For XComs, I think write ownership verification (task can
> > > only write its own XComs) is worth keeping regardless of multi-team
> > > mode -- it's more of a correctness concern than an authorization one.
> > > But I can also see the argument for a complete no-op when multi_team
> > > is off to keep things simple.
> > >
> > > Out of scope:
> > >
> > > AIP-72 [3] mentions three possible authorization models:
> > > pre-declaration (DAGs declare required resources), runtime request
> > > with deployment-level policy, and OPA integration via WASM bindings.
> > > I'm not trying to address any of those here. The team-boundary
> > > enforcement is the base that all three future models need.
> > >
> > > Implementation plan:
> > >
> > > 1. Add dag_id claim to JWT token generation in workloads.py
> > > 2. Implement has_variable_access team boundary check
> > > 3. Implement has_connection_access team boundary check
> > > 4. Implement has_xcom_access with write ownership + team boundary
> > > 5. Add XCom team resolution (XCom routes currently have no
> > > get_team_name_dep usage)
> > > 6. Tests for all authorization scenarios including cross-team denial
> > > 7. Documentation update for multi-team authorization behavior
> > >
> > > This should be a fairly small change -- mostly filling in the existing
> > > stubs with actual checks.
> > >
> > > Let me know what you think.
> > >
> > > Anish
> > >
> > > [1]
> https://github.com/apache/airflow/issues/60125#issuecomment-3712218766
> > > [2]
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components
> > > [3]
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-72+Task+Execution+Interface+aka+Task+SDK
> > > [4] https://github.com/apache/airflow/pull/58905
> > > [5] https://github.com/apache/airflow/pull/59476
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: [DISCUSS] Execution API access control for multi-team (AIP-67)

Reply via email to