Re: [DISCUSS] Execution API access control for multi-team (AIP-67)

Vikram Koka via dev Fri, 20 Feb 2026 13:33:05 -0800

+1 to Jarek's comments and questions here.

I am concerned that these proposed changes at the PR level could create an
illusion of security, potentially leading to many "security bugs" reported
by users who may have a very different expectation.


We need to clearly articulate a set of security expectations here before
addressing this in a set of PRs.

Vikram

On Fri, Feb 20, 2026 at 1:23 PM Jarek Potiuk <[email protected]> wrote:

> I think there is one more thing that I've been mentioning some time ago and
> it's time to put in more concrete words.
>
> Currently there is **no** protection against the tasks making claims that
> they belong to other tasks. While the running task by default receives the
> generated token to use from the supervisor - there is absolutely no problem
> for the forked task to inspect parent process memory to get the
> "supervisor" token that is used to sign the "task" token and generate a new
> token with **any"" dag_id or task_id or basically any other claim.
>
> This is by design currently, because we do not have any control implemented
> and part of the security model of Airflow 3.0 - 3.1 is that any task can
> perform **any** action on task SDK and we never even attempt to verify
> which tasks, dags state it can modify, which connections or variables it
> accesses. We only need to know that this "task" was authorised by the
> scheduler to call "task-sdk" API.
>
> With multi-team, this assumption is broken. We **need** to know and
> **enforce** task_id provenance. The situation when one task pretends to be
> another task is not acceptable any more - and violates basic isolation
> between the teams.
>
> As I understand the way how current supervisor-> task JWT token generation
> works is (and please correct me if I am wrong):
>
> * when supervisor starts it reads configuration of ("jwt_secret" /
> "jwt_private_key_path" / "jwt_kid")
> * when it starts a task, it uses this "secret" to generate a JWT_token for
> that task (with "dag_id", "dag_run_id", "task_instance_id") claims - and it
> is used by supervisor to communicate with api_server
> * forked task does not have direct reference to that token nor to the
> jwt_secret when started - it does not get it passed
> * executing task process is only supposed to communicate with the
> supervisor via in-process communication, it does not open connection nor
> use the JWT_token directly
>
> Now ... the interesting thing is that while the forked process does not
> have an "easy" API to not only get the token and use it directly, but also
> to generate NEW token because no matter how hard we try, the forked task
> will **always** be able to access "jwt_secret" and create its own JWT_token
> - and add **ANY** claims to that token. That's simply a consequence of
> using our fork model, also additional thing is that (default) approach of
> using the same unix user in the forked process, enables the forked process
> to read **any** of the information that supervisor process accesses
> (including configuration files, env variables and even memory of the
> supervisor process).
>
> There are two ways how running task can get JWT_SECRET:
>
> * since the task process is forked from the supervisor - everything that
> parent process has in memory - even if the method executed in the fork has
> no direct reference to it. The forked process can use "globals" and get to
> any variable, function, class, method that the parent supervisor process
> has. It can read any data in memory of the process. So if the JWT_Secret is
> already in memory of the parent process when the task process is forked, it
> also is in memory of the task process
>
> * since the task process is the same unix user as the parent process - it
> has access to all the same configuration, environment data. Even if the
> parent process will clear os.environ - the child process can read original
> environment vairables the parent process has been started with using
> `/proc` filesystem (it just needs to know the parent process id - which it
> always has). Unless more sophisticated mechanism are used such SELinux
> (requires kernel with SELinux and configured system-level SELinux rules) ,
> user impersonation and cgroups/proper access control to files (requires
> sudo access for parent process id)  - such forked process can do
> **everything** the parent process can do - including reading the
> configuration of JWT_secret and creating JWT_tokens with (again) any
> task_instance_id, any dag id, and dag_run id claim.
>
> So no matter what we do on the "server" side - the client side (supervisor)
> - in the default configuration already allows the task to pretend they are
> "whatever dag id" - in which case server side verification is pointless.
>
> I believe (Ashb? Kaxil? Amogh?) that was a deliberate decision when the API
> was designed when the Task SDK / JWT token for Airflow 3.0 was implemented
> (because we did not need it).
>
> I would love to hear if my thinking is wrong, but I highly doubt it, so I
> wonder what were the original thoughts here on how the task identity can
> have "cryptographically strong" provenance ?  I have some ideas for that,
> but I would love to hear what the original author's thoughts are ?
>
> J.
>
> On Fri, Feb 20, 2026 at 8:49 PM Anish Giri <[email protected]>
> wrote:
>
> > Thanks, Vincent! I appreciate your review. I'll get started on the
> > implementation and tag you on the PRs.
> >
> > Anish
> >
> > On Fri, Feb 20, 2026 at 8:23 AM Vincent Beck <[email protected]>
> wrote:
> > >
> > > Hey Anish,
> > >
> > > Everything you said makes sense to me, I might have questions on
> > specifics but I rather keep them for PRs, that'll make everything way
> > easier.
> > >
> > > Feel free to ping me on all your PRs,
> > > Vincent
> > >
> > > On 2026/02/20 07:34:47 Anish Giri wrote:
> > > > Hello everyone,
> > > >
> > > > Jarek asked for a proposal on #60125 [1] before implementing access
> > > > control for the Execution API's resource endpoints (variables,
> > > > connections, XComs), so here it is.
> > > >
> > > > After going through the codebase, I think this is really about
> > > > completing AIP-67's [2] multi-team boundary enforcement rather than
> > > > introducing a new security model. Most of the infrastructure already
> > > > exists. What's missing are the actual authorization checks.
> > > >
> > > > The current state:
> > > >
> > > > The Execution API has three authorization stubs that always return
> > True:
> > > >
> > > > - has_variable_access() in execution_api/routes/variables.py
> > > > - has_connection_access() in execution_api/routes/connections.py
> > > > - has_xcom_access() in execution_api/routes/xcoms.py
> > > >
> > > > All three have a "# TODO: Placeholder for actual implementation"
> > comment.
> > > >
> > > > For variables and connections, vincbeck's data-layer team scoping
> > > > (#58905 [4], #59476 [5]) already prevents cross-team data retrieval
> in
> > > > practice. A cross-team request returns a 404 rather than the
> resource.
> > > > So the data isolation is there, but the auth stubs don't reject these
> > > > requests early with a proper 403, and there's no second layer of
> > > > protection at the auth check itself.
> > > >
> > > > For XComs, the situation is different. There is no isolation at any
> > > > layer. XCom routes take dag_id, run_id, and task_id directly from URL
> > > > path parameters with no validation against the calling task's
> > > > identity. A task in Team-A's bundle can currently read and write
> > > > Team-B's XComs.
> > > >
> > > > There's already a get_team_name_dep() function in deps.py that
> > > > resolves a task's team via TaskInstance -> DagModel -> DagBundleModel
> > > > -> Team in a single join query. The variable and connection endpoints
> > > > already use it. XCom routes don't use it at all.
> > > >
> > > > Proposed approach:
> > > >
> > > > I'm thinking of this in two parts:
> > > >
> > > > 1) Team boundary checks for variables and connections
> > > >
> > > > Fill the auth stubs with team boundary checks. For reference, the
> Core
> > > > API does this in security.py. requires_access_variable() resolves the
> > > > resource's team via Variable.get_team_name(key), wraps it in
> > > > VariableDetails, and passes it to
> > > > auth_manager.is_authorized_variable(method, details, user). The auth
> > > > manager then checks team membership.
> > > >
> > > > For the Execution API, the flow would be similar but without going
> > > > through the auth manager (I'll explain why below):
> > > >
> > > > variable_key -> Variable.get_team_name(key) -> resource_team
> > > > token.id -> get_team_name_dep() -> task_team
> > > > Deny if resource_team != task_team (when both are non-None)
> > > >
> > > > When core.multi_team is disabled, get_team_name_dep returns None and
> > > > the check is skipped, so current single-team behavior stays exactly
> > > > the same.
> > > >
> > > > 2) XCom authorization
> > > >
> > > > This is the harder part. For writes, I think we should verify the
> > > > calling task is writing its own XComs -- the task identity from the
> > > > JWT should match the dag_id/task_id in the URL path. For reads,
> > > > enforce team boundary so a task can only read XComs from tasks within
> > > > the same team. This would allow cross-DAG xcom_pull within a team
> > > > (which people already do) while preventing cross-team access.
> > > >
> > > > To avoid a DB lookup on every request, I'd propose adding dag_id to
> > > > the JWT claims at generation time. The dag_id is already on the
> > > > TaskInstance schema in ExecuteTask.make() (workloads.py:142). The
> > > > JWTReissueMiddleware already preserves all claims during token
> > > > refresh, so this wouldn't break anything. Adding task_id and run_id
> to
> > > > the token could be done as a follow-up -- there's a TODO at
> > > > xcoms.py:315 about eventually deriving these from the token instead
> of
> > > > the URL.
> > > >
> > > > I'm not proposing to add team_name to the token. It's not available
> on
> > > > the TaskInstance schema at generation time. Resolving it requires a
> DB
> > > > join through DagModel -> DagBundleModel -> Team, which would slow
> down
> > > > the scheduler's task queuing path. Better to resolve it at request
> > > > time via get_team_name_dep.
> > > >
> > > > Why not go through BaseAuthManager?
> > > >
> > > > One design question I want to raise: the Execution API auth stubs
> > > > currently don't call BaseAuthManager.is_authorized_*(), and I think
> > > > they probably shouldn't. The BaseAuthManager interface is designed
> > > > around human identity (BaseUser with roles and team memberships), but
> > > > the Execution API operates on task identity (TIToken with a UUID).
> > > > These are very different things. A task doesn't have a "role" in the
> > > > RBAC sense, it has a team derived from its DAG's bundle.
> > > >
> > > > I'm leaning toward keeping the authorization logic directly in the
> > > > has_*_access dependency functions, using get_team_name_dep for team
> > > > resolution. This keeps the Execution API auth simple and avoids tying
> > > > task authorization to the human auth manager. But I'd like to hear if
> > > > others think we should instead extend BaseAuthManager with
> > > > task-identity-aware methods.
> > > >
> > > > What about single-team deployments?
> > > >
> > > > When core.multi_team=False (the default for most deployments), the
> > > > team boundary checks would be skipped entirely for variables and
> > > > connections. For XComs, I think write ownership verification (task
> can
> > > > only write its own XComs) is worth keeping regardless of multi-team
> > > > mode -- it's more of a correctness concern than an authorization one.
> > > > But I can also see the argument for a complete no-op when multi_team
> > > > is off to keep things simple.
> > > >
> > > > Out of scope:
> > > >
> > > > AIP-72 [3] mentions three possible authorization models:
> > > > pre-declaration (DAGs declare required resources), runtime request
> > > > with deployment-level policy, and OPA integration via WASM bindings.
> > > > I'm not trying to address any of those here. The team-boundary
> > > > enforcement is the base that all three future models need.
> > > >
> > > > Implementation plan:
> > > >
> > > > 1. Add dag_id claim to JWT token generation in workloads.py
> > > > 2. Implement has_variable_access team boundary check
> > > > 3. Implement has_connection_access team boundary check
> > > > 4. Implement has_xcom_access with write ownership + team boundary
> > > > 5. Add XCom team resolution (XCom routes currently have no
> > > > get_team_name_dep usage)
> > > > 6. Tests for all authorization scenarios including cross-team denial
> > > > 7. Documentation update for multi-team authorization behavior
> > > >
> > > > This should be a fairly small change -- mostly filling in the
> existing
> > > > stubs with actual checks.
> > > >
> > > > Let me know what you think.
> > > >
> > > > Anish
> > > >
> > > > [1]
> > https://github.com/apache/airflow/issues/60125#issuecomment-3712218766
> > > > [2]
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components
> > > > [3]
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-72+Task+Execution+Interface+aka+Task+SDK
> > > > [4] https://github.com/apache/airflow/pull/58905
> > > > [5] https://github.com/apache/airflow/pull/59476
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [email protected]
> > > > For additional commands, e-mail: [email protected]
> > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >
>

Re: [DISCUSS] Execution API access control for multi-team (AIP-67)

Reply via email to