Re: [DISCUSS] Execution API access control for multi-team (AIP-67)

Jarek Potiuk Fri, 20 Feb 2026 16:53:29 -0800

It does not change much (and is not good for performance). The "spawn"
suffers from the same "having access to the configuration that the
supervisor has". If the supervisor can read all configuration needed to get
JWT_secret then the process spawned from it can just repeat the same steps
that the supervisor process did to obtain the JWT_secret, and create
JWT_token with any claims. Also such spawned processes can dump memory of
the parent process via `/proc/<pid>/mem` if they are run with the same
user. Or use `gcore PID` to dump memory of the process to a file.


This is controlled by "ptrace" permission that is generally enabled on all
Linux systems by default (in order to enable debugging - for example gdb
attaching to a running process or dumping core with gcore). You can disable
this permission by SELinux or Yama Linux Security Module. And even that
does not restrict the capability of a spawned process to just read the same
configuration files or environment variables that the main process had and
re-create JWT-token with any claims.

It's just how unix user process separation works - any process of a UNIX
user by default can do **anything** with any other processes of the same
UNIX user.

J.


On Sat, Feb 21, 2026 at 1:27 AM Anish Giri <[email protected]> wrote:

> Hi Jarek, Vikram,
>
> Thanks for this, and I am really very glad that I posted it before
> writing any code.
>
> I spent some time going through your point about the fork model and
> the signing key. That's something I hadn't considered at all. I went
> and looked now at how the key flows through the code, and you're right
> that with fork the scheduler's heap gets inherited via copy on write,
> so the key material ends up in the worker's address space even though
> it is never explicitly passed. The task code runs in a second fork
> inside the supervisor, so it inherits the same memory. So the identity
> model isn't secure in the fork model, no matter what we build on top of
> it, anyway.
>
> There is one thing I was wondering, and please correct me if I am
> wrong, would switching from **fork** to **spawn** for its worker
> processes help here? Spawned workers start with a clean interpreter, so
> the signing key never gets to enter their address space. And since the
> supervisor's fork inherits from the worker (which never had the key),
> the task would not have it either now.
>
> Not sure if I'm oversimplifying it though. You mentioned having some
> ideas on the cryptographically strong provenance side and I would
> really like to hear them.
>
> Anish
>
> On Fri, Feb 20, 2026 at 3:32 PM Vikram Koka via dev
> <[email protected]> wrote:
> >
> > +1 to Jarek's comments and questions here.
> >
> > I am concerned that these proposed changes at the PR level could create
> an
> > illusion of security, potentially leading to many "security bugs"
> reported
> > by users who may have a very different expectation.
> >
> > We need to clearly articulate a set of security expectations here before
> > addressing this in a set of PRs.
> >
> > Vikram
> >
> > On Fri, Feb 20, 2026 at 1:23 PM Jarek Potiuk <[email protected]> wrote:
> >
> > > I think there is one more thing that I've been mentioning some time
> ago and
> > > it's time to put in more concrete words.
> > >
> > > Currently there is **no** protection against the tasks making claims
> that
> > > they belong to other tasks. While the running task by default receives
> the
> > > generated token to use from the supervisor - there is absolutely no
> problem
> > > for the forked task to inspect parent process memory to get the
> > > "supervisor" token that is used to sign the "task" token and generate
> a new
> > > token with **any"" dag_id or task_id or basically any other claim.
> > >
> > > This is by design currently, because we do not have any control
> implemented
> > > and part of the security model of Airflow 3.0 - 3.1 is that any task
> can
> > > perform **any** action on task SDK and we never even attempt to verify
> > > which tasks, dags state it can modify, which connections or variables
> it
> > > accesses. We only need to know that this "task" was authorised by the
> > > scheduler to call "task-sdk" API.
> > >
> > > With multi-team, this assumption is broken. We **need** to know and
> > > **enforce** task_id provenance. The situation when one task pretends
> to be
> > > another task is not acceptable any more - and violates basic isolation
> > > between the teams.
> > >
> > > As I understand the way how current supervisor-> task JWT token
> generation
> > > works is (and please correct me if I am wrong):
> > >
> > > * when supervisor starts it reads configuration of ("jwt_secret" /
> > > "jwt_private_key_path" / "jwt_kid")
> > > * when it starts a task, it uses this "secret" to generate a JWT_token
> for
> > > that task (with "dag_id", "dag_run_id", "task_instance_id") claims -
> and it
> > > is used by supervisor to communicate with api_server
> > > * forked task does not have direct reference to that token nor to the
> > > jwt_secret when started - it does not get it passed
> > > * executing task process is only supposed to communicate with the
> > > supervisor via in-process communication, it does not open connection
> nor
> > > use the JWT_token directly
> > >
> > > Now ... the interesting thing is that while the forked process does not
> > > have an "easy" API to not only get the token and use it directly, but
> also
> > > to generate NEW token because no matter how hard we try, the forked
> task
> > > will **always** be able to access "jwt_secret" and create its own
> JWT_token
> > > - and add **ANY** claims to that token. That's simply a consequence of
> > > using our fork model, also additional thing is that (default) approach
> of
> > > using the same unix user in the forked process, enables the forked
> process
> > > to read **any** of the information that supervisor process accesses
> > > (including configuration files, env variables and even memory of the
> > > supervisor process).
> > >
> > > There are two ways how running task can get JWT_SECRET:
> > >
> > > * since the task process is forked from the supervisor - everything
> that
> > > parent process has in memory - even if the method executed in the fork
> has
> > > no direct reference to it. The forked process can use "globals" and
> get to
> > > any variable, function, class, method that the parent supervisor
> process
> > > has. It can read any data in memory of the process. So if the
> JWT_Secret is
> > > already in memory of the parent process when the task process is
> forked, it
> > > also is in memory of the task process
> > >
> > > * since the task process is the same unix user as the parent process -
> it
> > > has access to all the same configuration, environment data. Even if the
> > > parent process will clear os.environ - the child process can read
> original
> > > environment vairables the parent process has been started with using
> > > `/proc` filesystem (it just needs to know the parent process id -
> which it
> > > always has). Unless more sophisticated mechanism are used such SELinux
> > > (requires kernel with SELinux and configured system-level SELinux
> rules) ,
> > > user impersonation and cgroups/proper access control to files (requires
> > > sudo access for parent process id)  - such forked process can do
> > > **everything** the parent process can do - including reading the
> > > configuration of JWT_secret and creating JWT_tokens with (again) any
> > > task_instance_id, any dag id, and dag_run id claim.
> > >
> > > So no matter what we do on the "server" side - the client side
> (supervisor)
> > > - in the default configuration already allows the task to pretend they
> are
> > > "whatever dag id" - in which case server side verification is
> pointless.
> > >
> > > I believe (Ashb? Kaxil? Amogh?) that was a deliberate decision when
> the API
> > > was designed when the Task SDK / JWT token for Airflow 3.0 was
> implemented
> > > (because we did not need it).
> > >
> > > I would love to hear if my thinking is wrong, but I highly doubt it,
> so I
> > > wonder what were the original thoughts here on how the task identity
> can
> > > have "cryptographically strong" provenance ?  I have some ideas for
> that,
> > > but I would love to hear what the original author's thoughts are ?
> > >
> > > J.
> > >
> > > On Fri, Feb 20, 2026 at 8:49 PM Anish Giri <[email protected]>
> > > wrote:
> > >
> > > > Thanks, Vincent! I appreciate your review. I'll get started on the
> > > > implementation and tag you on the PRs.
> > > >
> > > > Anish
> > > >
> > > > On Fri, Feb 20, 2026 at 8:23 AM Vincent Beck <[email protected]>
> > > wrote:
> > > > >
> > > > > Hey Anish,
> > > > >
> > > > > Everything you said makes sense to me, I might have questions on
> > > > specifics but I rather keep them for PRs, that'll make everything way
> > > > easier.
> > > > >
> > > > > Feel free to ping me on all your PRs,
> > > > > Vincent
> > > > >
> > > > > On 2026/02/20 07:34:47 Anish Giri wrote:
> > > > > > Hello everyone,
> > > > > >
> > > > > > Jarek asked for a proposal on #60125 [1] before implementing
> access
> > > > > > control for the Execution API's resource endpoints (variables,
> > > > > > connections, XComs), so here it is.
> > > > > >
> > > > > > After going through the codebase, I think this is really about
> > > > > > completing AIP-67's [2] multi-team boundary enforcement rather
> than
> > > > > > introducing a new security model. Most of the infrastructure
> already
> > > > > > exists. What's missing are the actual authorization checks.
> > > > > >
> > > > > > The current state:
> > > > > >
> > > > > > The Execution API has three authorization stubs that always
> return
> > > > True:
> > > > > >
> > > > > > - has_variable_access() in execution_api/routes/variables.py
> > > > > > - has_connection_access() in execution_api/routes/connections.py
> > > > > > - has_xcom_access() in execution_api/routes/xcoms.py
> > > > > >
> > > > > > All three have a "# TODO: Placeholder for actual implementation"
> > > > comment.
> > > > > >
> > > > > > For variables and connections, vincbeck's data-layer team scoping
> > > > > > (#58905 [4], #59476 [5]) already prevents cross-team data
> retrieval
> > > in
> > > > > > practice. A cross-team request returns a 404 rather than the
> > > resource.
> > > > > > So the data isolation is there, but the auth stubs don't reject
> these
> > > > > > requests early with a proper 403, and there's no second layer of
> > > > > > protection at the auth check itself.
> > > > > >
> > > > > > For XComs, the situation is different. There is no isolation at
> any
> > > > > > layer. XCom routes take dag_id, run_id, and task_id directly
> from URL
> > > > > > path parameters with no validation against the calling task's
> > > > > > identity. A task in Team-A's bundle can currently read and write
> > > > > > Team-B's XComs.
> > > > > >
> > > > > > There's already a get_team_name_dep() function in deps.py that
> > > > > > resolves a task's team via TaskInstance -> DagModel ->
> DagBundleModel
> > > > > > -> Team in a single join query. The variable and connection
> endpoints
> > > > > > already use it. XCom routes don't use it at all.
> > > > > >
> > > > > > Proposed approach:
> > > > > >
> > > > > > I'm thinking of this in two parts:
> > > > > >
> > > > > > 1) Team boundary checks for variables and connections
> > > > > >
> > > > > > Fill the auth stubs with team boundary checks. For reference, the
> > > Core
> > > > > > API does this in security.py. requires_access_variable()
> resolves the
> > > > > > resource's team via Variable.get_team_name(key), wraps it in
> > > > > > VariableDetails, and passes it to
> > > > > > auth_manager.is_authorized_variable(method, details, user). The
> auth
> > > > > > manager then checks team membership.
> > > > > >
> > > > > > For the Execution API, the flow would be similar but without
> going
> > > > > > through the auth manager (I'll explain why below):
> > > > > >
> > > > > > variable_key -> Variable.get_team_name(key) -> resource_team
> > > > > > token.id -> get_team_name_dep() -> task_team
> > > > > > Deny if resource_team != task_team (when both are non-None)
> > > > > >
> > > > > > When core.multi_team is disabled, get_team_name_dep returns None
> and
> > > > > > the check is skipped, so current single-team behavior stays
> exactly
> > > > > > the same.
> > > > > >
> > > > > > 2) XCom authorization
> > > > > >
> > > > > > This is the harder part. For writes, I think we should verify the
> > > > > > calling task is writing its own XComs -- the task identity from
> the
> > > > > > JWT should match the dag_id/task_id in the URL path. For reads,
> > > > > > enforce team boundary so a task can only read XComs from tasks
> within
> > > > > > the same team. This would allow cross-DAG xcom_pull within a team
> > > > > > (which people already do) while preventing cross-team access.
> > > > > >
> > > > > > To avoid a DB lookup on every request, I'd propose adding dag_id
> to
> > > > > > the JWT claims at generation time. The dag_id is already on the
> > > > > > TaskInstance schema in ExecuteTask.make() (workloads.py:142). The
> > > > > > JWTReissueMiddleware already preserves all claims during token
> > > > > > refresh, so this wouldn't break anything. Adding task_id and
> run_id
> > > to
> > > > > > the token could be done as a follow-up -- there's a TODO at
> > > > > > xcoms.py:315 about eventually deriving these from the token
> instead
> > > of
> > > > > > the URL.
> > > > > >
> > > > > > I'm not proposing to add team_name to the token. It's not
> available
> > > on
> > > > > > the TaskInstance schema at generation time. Resolving it
> requires a
> > > DB
> > > > > > join through DagModel -> DagBundleModel -> Team, which would slow
> > > down
> > > > > > the scheduler's task queuing path. Better to resolve it at
> request
> > > > > > time via get_team_name_dep.
> > > > > >
> > > > > > Why not go through BaseAuthManager?
> > > > > >
> > > > > > One design question I want to raise: the Execution API auth stubs
> > > > > > currently don't call BaseAuthManager.is_authorized_*(), and I
> think
> > > > > > they probably shouldn't. The BaseAuthManager interface is
> designed
> > > > > > around human identity (BaseUser with roles and team
> memberships), but
> > > > > > the Execution API operates on task identity (TIToken with a
> UUID).
> > > > > > These are very different things. A task doesn't have a "role" in
> the
> > > > > > RBAC sense, it has a team derived from its DAG's bundle.
> > > > > >
> > > > > > I'm leaning toward keeping the authorization logic directly in
> the
> > > > > > has_*_access dependency functions, using get_team_name_dep for
> team
> > > > > > resolution. This keeps the Execution API auth simple and avoids
> tying
> > > > > > task authorization to the human auth manager. But I'd like to
> hear if
> > > > > > others think we should instead extend BaseAuthManager with
> > > > > > task-identity-aware methods.
> > > > > >
> > > > > > What about single-team deployments?
> > > > > >
> > > > > > When core.multi_team=False (the default for most deployments),
> the
> > > > > > team boundary checks would be skipped entirely for variables and
> > > > > > connections. For XComs, I think write ownership verification
> (task
> > > can
> > > > > > only write its own XComs) is worth keeping regardless of
> multi-team
> > > > > > mode -- it's more of a correctness concern than an authorization
> one.
> > > > > > But I can also see the argument for a complete no-op when
> multi_team
> > > > > > is off to keep things simple.
> > > > > >
> > > > > > Out of scope:
> > > > > >
> > > > > > AIP-72 [3] mentions three possible authorization models:
> > > > > > pre-declaration (DAGs declare required resources), runtime
> request
> > > > > > with deployment-level policy, and OPA integration via WASM
> bindings.
> > > > > > I'm not trying to address any of those here. The team-boundary
> > > > > > enforcement is the base that all three future models need.
> > > > > >
> > > > > > Implementation plan:
> > > > > >
> > > > > > 1. Add dag_id claim to JWT token generation in workloads.py
> > > > > > 2. Implement has_variable_access team boundary check
> > > > > > 3. Implement has_connection_access team boundary check
> > > > > > 4. Implement has_xcom_access with write ownership + team boundary
> > > > > > 5. Add XCom team resolution (XCom routes currently have no
> > > > > > get_team_name_dep usage)
> > > > > > 6. Tests for all authorization scenarios including cross-team
> denial
> > > > > > 7. Documentation update for multi-team authorization behavior
> > > > > >
> > > > > > This should be a fairly small change -- mostly filling in the
> > > existing
> > > > > > stubs with actual checks.
> > > > > >
> > > > > > Let me know what you think.
> > > > > >
> > > > > > Anish
> > > > > >
> > > > > > [1]
> > > >
> https://github.com/apache/airflow/issues/60125#issuecomment-3712218766
> > > > > > [2]
> > > >
> > >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components
> > > > > > [3]
> > > >
> > >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-72+Task+Execution+Interface+aka+Task+SDK
> > > > > > [4] https://github.com/apache/airflow/pull/58905
> > > > > > [5] https://github.com/apache/airflow/pull/59476
> > > > > >
> > > > > >
> ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: [email protected]
> > > > > > For additional commands, e-mail: [email protected]
> > > > > >
> > > > > >
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: [email protected]
> > > > > For additional commands, e-mail: [email protected]
> > > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [email protected]
> > > > For additional commands, e-mail: [email protected]
> > > >
> > > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: [DISCUSS] Execution API access control for multi-team (AIP-67)

Reply via email to