Re: [DISCUSS] Execution API access control for multi-team (AIP-67)

Jarek Potiuk Thu, 26 Feb 2026 06:39:25 -0800

Good stuff Ash. Thanks for all those well-thought-out items :).

If we can make it, I think we can have a bit more discussion at the devlist
today (I added a topic for that)
Let me add a few comments; that might lead to a few points we can cover
during the call.



> No, the supervisor does nothing with those secrets. It does not generate
> any tokens. The token is generated on the executor and set to the worker
> via some mechanism (outside of the Task Execution API. That’s the executors
> responsibility.)
>

Good. We received a few reports about it from security researchers (we
discarded them as invalid because, until now, the JWT_TOKEN "scope" did not
matter). Consequently, it's not clearly documented, and we are about to
start making some claims here.


> > If the API server were the sole
> > token issuer, the scheduler would dispatch tasks with just the task
> > identity, no token, no signing key
>
>
> You can (and probably should) run the Workers _without_ that setting set.
> This is a bit of a departure from the (unwritten?) rule that all Airflow
> components should have the same config. I will propose this change in our
> docs. Or if someone wants to do that I’ll approve it.
>

Unfortunately, it is well-documented:
https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html

We  even explain that even if some configuration values refer to other
components, they should be configured the same for both the worker and the
web server. We also provide an example of an API server secret key, We do
not have a clear explamation which configuration values should be set for
which components  - just "configuraion group" is not enough as explained in
the example given: Here is the paragraph:

*> Use the same configuration across all the Airflow components. While each
component does not require all, some configurations need to be same
otherwise they would not work as expected. A good example for that is
secret_key which should be same on the Webserver and Worker to allow
Webserver to fetch logs from Worker.*

Also, I believe it's more than just documentation. We also have:

* As described in
https://airflow.apache.org/docs/apache-airflow/stable/howto/set-config.html#setting-configuration-options,
the "airflow config --defaults" tool produces example configuration for
airflow (a single file containing only the necessary configurations,
without separate configurations for each component). We should likely have
a way to use separate configuration files for separate components. Even
better - I think - we should have a "secure" version that does not produce
defaults for sensitive configurations. I think we must force users to pass
these via environment variables (see my comments about PR_SET_DUMPABLE
below).

* docker compose we release where not only a single "airflow.cfg" is shared
between different components, but also AIRFLOW__API_AUTH__JWT_SECRET is in
`<<: *airflow-common` section that is shared by all components

* Helm chart where "airflow_config_mount" has the same "airflow.cfg" file
is shared between all components.

IMHO, if we want to introduce **new** approach where there should be
different configuration parameters set for different components, all those
places should cover this and provide a way to set different values (or
explicitly document that they are not implementing the isolation
properties). This might actually be simpler - almost no work needed except
having only non-sensitive config  - if we implement a "secure" running mode
where sensitive "configuration" is only available via env vars (see below)

Also it does not really solve a problem of DagFileProcessor and Triggerer.
Both our processes and user processes currently access the same
configuration files (for example, the DB configuration string), allowing
the user code to do "anything".  If we want to promise certain levels of
isolation (which we have done partially but not precisely so far), we need
a solution for running user code in both DagFileProcessor child processes
and the triggerer's async loop.


> Tasks need to be able to operate only being _pushed_ information.
>
> Also there is https://github.com/apache/airflow/pull/60108 in progress
> which along a change I’m working on will mean that the token that is being
> sent along with a workload can only be used to call the /run endpoint once,
> which helps a lot here.
>
>
Yep. That's a nice improvement. However, each executor should have a
somewhat secure way to pass the original token and clearly state its
security "level" in this context (For example, the LocalExecutor provides
no isolation or security as you noted).


> I’m looking at if we can drop capabilities in the supervisor, such that
> the forked user-code process is then also unable to read memory of another
> process even of the same user. This change, when coupled with
> removing/documentig/warning that jwt_secret should not be on workes would I
> think make this entirely secure.
>
> Not quite the capabilities subsystem, but there is a prctl syscall and the
> PR_SET_DUMPABLE flag
> https://man7.org/linux/man-pages/man2/PR_SET_DUMPABLE.2const.html  we can
> use - see some simple testing
> https://gist.github.com/ashb/5d62f244b837cb4052743318eb18fdc6 - PR
> incoming today. I’ll make sure to include the Workers (i.e. celery
> processes, as JWT transits through those as well)
>

Yep - good idea. That's one of the options I looked up (with suid and
cgroups). It seems to add a good layer of defense against reading memory
from the parent process (as long as we do it early enough so sensitive data
isn't already in memory before forking—if the data is already there, the
child process will access it). The celery process forking dance should
ensure the celery "master" process that forks the supervisors never sees
any sensitive credentials..

Still I think that alone does not prevent the DagFileProcessor parsing
processes and Triggerer async coroutines from accessing the configuration
the main process ran with if the values are configured in the configuration
files. My thinking so far was that we could use `PR_SET_DUMPABLE` similarly
for both the Triggerer and DagFileProcessor, but any sensitive data should
**only** ever be set via environment variables - because PR_SET_DUMPABLE
also protects /proc/pid/environ in the same way it protects /proc/pid/mem
(but it does not prevent reading from configuration files).

I believe we could enforce that if we ensure no "non-shareable" sensitive
configuration variable is ever read from the config file. I would say that
we might need a "security_isolation" flag or similar that will, for
example, prevent reading such sensitive information from config files (and
even actively fail when one is found). Using PR_SET_DUMPABLE for all
components needing isolation might be a very secure solution. It could also
perform other checks and fail to start a component if any security-related
checks fail (for example, permissions for the home directory are too
open)—similar to what `ssh` does when it refuses keys with overly broad
permissions. It is also deployment-independent, which is very cool, because
we do not want to "force" users to configure sudo-allowed UNIX users, for
example, or generally define several UNIX users (as was the case with
impersonation).

I think applying all of that brings us very close to pretty "safe"
isolation feature.


> ---
> Claims
>
>
> So I’m not sure we _need_ to include the dag/task/run id etc in the claim,
> as the UUID already uniquely identifies the TI row, and we need to fetch
> that on every API access to validate that the TI is actually still in a
> “valid” state to be able to speak to the Execution API. Specifically that
> it is still the latest TI try (because if it’s not the latest attempt, the
> UUID will only be found in the ti history table, and that can't be running
> so this task shouldn’t be able to execute any API endpoints.) Given that,
> we can keep the tokens shorter and make the loaded TI object available in
> the request context.
>

I don't think we need to do it indeed, if we will always make such check -
the question is whether adding for example team_id would allow to avoid
some join queries. This is likely a question for Vincent and Nicolas.


>
> ---
> LocalExecutor
>
> Honestly, there’s not _much_ protection you can do when running things
> with the local executor. I’d say we document “if you care about this
> pattern of protection, do not use local exec”. (Because the secrets are
> almost certainly readable on disk)
>

Oh absolutely. We should make sure that this is well documented.


>
>
> > On 25 Feb 2026, at 06:33, Anish Giri <[email protected]> wrote:
> >
> > Hi Jarek,
> >
> > As we are waiting for Ash/Kaxil/Amogh's input, I tried to trace the
> > token flow through the codebase. I would appreciate your thoughts on
> > it.
> >
> > Your point about the forked task always being able to extract the
> > signing key made me rethink the whole approach. I was wondering if the
> > signing key actually need to be in the scheduler/worker process at
> > all?
> >
> > Right now the scheduler loads the signing key at startup and generates
> > the tokens in its scheduling loop. If the API server were the sole
> > token issuer, the scheduler would dispatch tasks with just the task
> > identity, no token, no signing key. The worker would request a scoped
> > execution token from the API server before calling `start()`. Nothing
> > for a forked task to extract. Some of the foundation for this exists
> > in the scope based token work I'm doing in PR #60108. This would fully
> > cover distributed deployments (Celery, Kubernetes) where the task has
> > no path to the API server's memory.
> >
> > For co located deployments (LocalExecutor), a task running as the same
> > Unix user could still reach the API server's memory via `/proc`. But
> > if the API server registers every token's JTI at issuance and rejects
> > unregistered JTIs, a forged token gets rejected even with a valid
> > signature, because its JTI was never issued. The infrastructure for
> > this seems straightforward and a table similar to `revoked_token` from
> > #61339, with the same JTI lookup pattern but with the inverted logic.
> >
> > I think so that the combination would cover all deployment topologies.
> > Please correct me if I am wrong. OS-level hardening could still be
> > recommended, but I think it wouldn't be required.
> >
> > I might be missing something obvious. I would love to hear if there's
> > a flaw in this reasoning, or if the original authors had a different
> > approach in mind.
> >
> > Anish
> >
> > On Fri, Feb 20, 2026 at 6:58 PM Jarek Potiuk <[email protected]> wrote:
> >>
> >>> You mentioned having some ideas on the cryptographically strong
> >> provenance side and I would really like to hear them.
> >>
> >> I would first like to hear the original thinking - from Ash, Kaxil,
> Amogh
> >> - I do not want to introduce too much of a complexity, because maybe I
> am -
> >> indeed overcomplicating it, maybe there are some ways that were
> discussed
> >> before to protect the secret key and JWT signing process.
> >>
> >> So far my ideas are pretty complicated and generally involve instructing
> >> users to do **a number extra** things in their deployment process that
> are
> >> far beyond installing the app and beyond "python" realm, but I might be
> >> missing something obvious.
> >>
> >> J.
> >>
> >>
> >> On Sat, Feb 21, 2026 at 1:52 AM Jarek Potiuk <[email protected]> wrote:
> >>
> >>> It does not change much (and is not good for performance). The "spawn"
> >>> suffers from the same "having access to the configuration that the
> >>> supervisor has". If the supervisor can read all configuration needed
> to get
> >>> JWT_secret then the process spawned from it can just repeat the same
> steps
> >>> that the supervisor process did to obtain the JWT_secret, and create
> >>> JWT_token with any claims. Also such spawned processes can dump memory
> of
> >>> the parent process via `/proc/<pid>/mem` if they are run with the same
> >>> user. Or use `gcore PID` to dump memory of the process to a file.
> >>>
> >>> This is controlled by "ptrace" permission that is generally enabled on
> all
> >>> Linux systems by default (in order to enable debugging - for example
> gdb
> >>> attaching to a running process or dumping core with gcore). You can
> disable
> >>> this permission by SELinux or Yama Linux Security Module. And even that
> >>> does not restrict the capability of a spawned process to just read the
> same
> >>> configuration files or environment variables that the main process had
> and
> >>> re-create JWT-token with any claims.
> >>>
> >>> It's just how unix user process separation works - any process of a
> UNIX
> >>> user by default can do **anything** with any other processes of the
> same
> >>> UNIX user.
> >>>
> >>> J.
> >>>
> >>>
> >>> On Sat, Feb 21, 2026 at 1:27 AM Anish Giri <[email protected]>
> >>> wrote:
> >>>
> >>>> Hi Jarek, Vikram,
> >>>>
> >>>> Thanks for this, and I am really very glad that I posted it before
> >>>> writing any code.
> >>>>
> >>>> I spent some time going through your point about the fork model and
> >>>> the signing key. That's something I hadn't considered at all. I went
> >>>> and looked now at how the key flows through the code, and you're right
> >>>> that with fork the scheduler's heap gets inherited via copy on write,
> >>>> so the key material ends up in the worker's address space even though
> >>>> it is never explicitly passed. The task code runs in a second fork
> >>>> inside the supervisor, so it inherits the same memory. So the identity
> >>>> model isn't secure in the fork model, no matter what we build on top
> of
> >>>> it, anyway.
> >>>>
> >>>> There is one thing I was wondering, and please correct me if I am
> >>>> wrong, would switching from **fork** to **spawn** for its worker
> >>>> processes help here? Spawned workers start with a clean interpreter,
> so
> >>>> the signing key never gets to enter their address space. And since the
> >>>> supervisor's fork inherits from the worker (which never had the key),
> >>>> the task would not have it either now.
> >>>>
> >>>> Not sure if I'm oversimplifying it though. You mentioned having some
> >>>> ideas on the cryptographically strong provenance side and I would
> >>>> really like to hear them.
> >>>>
> >>>> Anish
> >>>>
> >>>> On Fri, Feb 20, 2026 at 3:32 PM Vikram Koka via dev
> >>>> <[email protected]> wrote:
> >>>>>
> >>>>> +1 to Jarek's comments and questions here.
> >>>>>
> >>>>> I am concerned that these proposed changes at the PR level could
> create
> >>>> an
> >>>>> illusion of security, potentially leading to many "security bugs"
> >>>> reported
> >>>>> by users who may have a very different expectation.
> >>>>>
> >>>>> We need to clearly articulate a set of security expectations here
> before
> >>>>> addressing this in a set of PRs.
> >>>>>
> >>>>> Vikram
> >>>>>
> >>>>> On Fri, Feb 20, 2026 at 1:23 PM Jarek Potiuk <[email protected]>
> wrote:
> >>>>>
> >>>>>> I think there is one more thing that I've been mentioning some time
> >>>> ago and
> >>>>>> it's time to put in more concrete words.
> >>>>>>
> >>>>>> Currently there is **no** protection against the tasks making claims
> >>>> that
> >>>>>> they belong to other tasks. While the running task by default
> >>>> receives the
> >>>>>> generated token to use from the supervisor - there is absolutely no
> >>>> problem
> >>>>>> for the forked task to inspect parent process memory to get the
> >>>>>> "supervisor" token that is used to sign the "task" token and
> generate
> >>>> a new
> >>>>>> token with **any"" dag_id or task_id or basically any other claim.
> >>>>>>
> >>>>>> This is by design currently, because we do not have any control
> >>>> implemented
> >>>>>> and part of the security model of Airflow 3.0 - 3.1 is that any task
> >>>> can
> >>>>>> perform **any** action on task SDK and we never even attempt to
> verify
> >>>>>> which tasks, dags state it can modify, which connections or
> variables
> >>>> it
> >>>>>> accesses. We only need to know that this "task" was authorised by
> the
> >>>>>> scheduler to call "task-sdk" API.
> >>>>>>
> >>>>>> With multi-team, this assumption is broken. We **need** to know and
> >>>>>> **enforce** task_id provenance. The situation when one task pretends
> >>>> to be
> >>>>>> another task is not acceptable any more - and violates basic
> isolation
> >>>>>> between the teams.
> >>>>>>
> >>>>>> As I understand the way how current supervisor-> task JWT token
> >>>> generation
> >>>>>> works is (and please correct me if I am wrong):
> >>>>>>
> >>>>>> * when supervisor starts it reads configuration of ("jwt_secret" /
> >>>>>> "jwt_private_key_path" / "jwt_kid")
> >>>>>> * when it starts a task, it uses this "secret" to generate a
> >>>> JWT_token for
> >>>>>> that task (with "dag_id", "dag_run_id", "task_instance_id") claims -
> >>>> and it
> >>>>>> is used by supervisor to communicate with api_server
> >>>>>> * forked task does not have direct reference to that token nor to
> the
> >>>>>> jwt_secret when started - it does not get it passed
> >>>>>> * executing task process is only supposed to communicate with the
> >>>>>> supervisor via in-process communication, it does not open connection
> >>>> nor
> >>>>>> use the JWT_token directly
> >>>>>>
> >>>>>> Now ... the interesting thing is that while the forked process does
> >>>> not
> >>>>>> have an "easy" API to not only get the token and use it directly,
> but
> >>>> also
> >>>>>> to generate NEW token because no matter how hard we try, the forked
> >>>> task
> >>>>>> will **always** be able to access "jwt_secret" and create its own
> >>>> JWT_token
> >>>>>> - and add **ANY** claims to that token. That's simply a consequence
> of
> >>>>>> using our fork model, also additional thing is that (default)
> >>>> approach of
> >>>>>> using the same unix user in the forked process, enables the forked
> >>>> process
> >>>>>> to read **any** of the information that supervisor process accesses
> >>>>>> (including configuration files, env variables and even memory of the
> >>>>>> supervisor process).
> >>>>>>
> >>>>>> There are two ways how running task can get JWT_SECRET:
> >>>>>>
> >>>>>> * since the task process is forked from the supervisor - everything
> >>>> that
> >>>>>> parent process has in memory - even if the method executed in the
> >>>> fork has
> >>>>>> no direct reference to it. The forked process can use "globals" and
> >>>> get to
> >>>>>> any variable, function, class, method that the parent supervisor
> >>>> process
> >>>>>> has. It can read any data in memory of the process. So if the
> >>>> JWT_Secret is
> >>>>>> already in memory of the parent process when the task process is
> >>>> forked, it
> >>>>>> also is in memory of the task process
> >>>>>>
> >>>>>> * since the task process is the same unix user as the parent process
> >>>> - it
> >>>>>> has access to all the same configuration, environment data. Even if
> >>>> the
> >>>>>> parent process will clear os.environ - the child process can read
> >>>> original
> >>>>>> environment vairables the parent process has been started with using
> >>>>>> `/proc` filesystem (it just needs to know the parent process id -
> >>>> which it
> >>>>>> always has). Unless more sophisticated mechanism are used such
> SELinux
> >>>>>> (requires kernel with SELinux and configured system-level SELinux
> >>>> rules) ,
> >>>>>> user impersonation and cgroups/proper access control to files
> >>>> (requires
> >>>>>> sudo access for parent process id)  - such forked process can do
> >>>>>> **everything** the parent process can do - including reading the
> >>>>>> configuration of JWT_secret and creating JWT_tokens with (again) any
> >>>>>> task_instance_id, any dag id, and dag_run id claim.
> >>>>>>
> >>>>>> So no matter what we do on the "server" side - the client side
> >>>> (supervisor)
> >>>>>> - in the default configuration already allows the task to pretend
> >>>> they are
> >>>>>> "whatever dag id" - in which case server side verification is
> >>>> pointless.
> >>>>>>
> >>>>>> I believe (Ashb? Kaxil? Amogh?) that was a deliberate decision when
> >>>> the API
> >>>>>> was designed when the Task SDK / JWT token for Airflow 3.0 was
> >>>> implemented
> >>>>>> (because we did not need it).
> >>>>>>
> >>>>>> I would love to hear if my thinking is wrong, but I highly doubt it,
> >>>> so I
> >>>>>> wonder what were the original thoughts here on how the task identity
> >>>> can
> >>>>>> have "cryptographically strong" provenance ?  I have some ideas for
> >>>> that,
> >>>>>> but I would love to hear what the original author's thoughts are ?
> >>>>>>
> >>>>>> J.
> >>>>>>
> >>>>>> On Fri, Feb 20, 2026 at 8:49 PM Anish Giri <
> [email protected]>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Thanks, Vincent! I appreciate your review. I'll get started on the
> >>>>>>> implementation and tag you on the PRs.
> >>>>>>>
> >>>>>>> Anish
> >>>>>>>
> >>>>>>> On Fri, Feb 20, 2026 at 8:23 AM Vincent Beck <[email protected]>
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Hey Anish,
> >>>>>>>>
> >>>>>>>> Everything you said makes sense to me, I might have questions on
> >>>>>>> specifics but I rather keep them for PRs, that'll make everything
> >>>> way
> >>>>>>> easier.
> >>>>>>>>
> >>>>>>>> Feel free to ping me on all your PRs,
> >>>>>>>> Vincent
> >>>>>>>>
> >>>>>>>> On 2026/02/20 07:34:47 Anish Giri wrote:
> >>>>>>>>> Hello everyone,
> >>>>>>>>>
> >>>>>>>>> Jarek asked for a proposal on #60125 [1] before implementing
> >>>> access
> >>>>>>>>> control for the Execution API's resource endpoints (variables,
> >>>>>>>>> connections, XComs), so here it is.
> >>>>>>>>>
> >>>>>>>>> After going through the codebase, I think this is really about
> >>>>>>>>> completing AIP-67's [2] multi-team boundary enforcement rather
> >>>> than
> >>>>>>>>> introducing a new security model. Most of the infrastructure
> >>>> already
> >>>>>>>>> exists. What's missing are the actual authorization checks.
> >>>>>>>>>
> >>>>>>>>> The current state:
> >>>>>>>>>
> >>>>>>>>> The Execution API has three authorization stubs that always
> >>>> return
> >>>>>>> True:
> >>>>>>>>>
> >>>>>>>>> - has_variable_access() in execution_api/routes/variables.py
> >>>>>>>>> - has_connection_access() in execution_api/routes/connections.py
> >>>>>>>>> - has_xcom_access() in execution_api/routes/xcoms.py
> >>>>>>>>>
> >>>>>>>>> All three have a "# TODO: Placeholder for actual implementation"
> >>>>>>> comment.
> >>>>>>>>>
> >>>>>>>>> For variables and connections, vincbeck's data-layer team
> >>>> scoping
> >>>>>>>>> (#58905 [4], #59476 [5]) already prevents cross-team data
> >>>> retrieval
> >>>>>> in
> >>>>>>>>> practice. A cross-team request returns a 404 rather than the
> >>>>>> resource.
> >>>>>>>>> So the data isolation is there, but the auth stubs don't reject
> >>>> these
> >>>>>>>>> requests early with a proper 403, and there's no second layer of
> >>>>>>>>> protection at the auth check itself.
> >>>>>>>>>
> >>>>>>>>> For XComs, the situation is different. There is no isolation at
> >>>> any
> >>>>>>>>> layer. XCom routes take dag_id, run_id, and task_id directly
> >>>> from URL
> >>>>>>>>> path parameters with no validation against the calling task's
> >>>>>>>>> identity. A task in Team-A's bundle can currently read and write
> >>>>>>>>> Team-B's XComs.
> >>>>>>>>>
> >>>>>>>>> There's already a get_team_name_dep() function in deps.py that
> >>>>>>>>> resolves a task's team via TaskInstance -> DagModel ->
> >>>> DagBundleModel
> >>>>>>>>> -> Team in a single join query. The variable and connection
> >>>> endpoints
> >>>>>>>>> already use it. XCom routes don't use it at all.
> >>>>>>>>>
> >>>>>>>>> Proposed approach:
> >>>>>>>>>
> >>>>>>>>> I'm thinking of this in two parts:
> >>>>>>>>>
> >>>>>>>>> 1) Team boundary checks for variables and connections
> >>>>>>>>>
> >>>>>>>>> Fill the auth stubs with team boundary checks. For reference,
> >>>> the
> >>>>>> Core
> >>>>>>>>> API does this in security.py. requires_access_variable()
> >>>> resolves the
> >>>>>>>>> resource's team via Variable.get_team_name(key), wraps it in
> >>>>>>>>> VariableDetails, and passes it to
> >>>>>>>>> auth_manager.is_authorized_variable(method, details, user). The
> >>>> auth
> >>>>>>>>> manager then checks team membership.
> >>>>>>>>>
> >>>>>>>>> For the Execution API, the flow would be similar but without
> >>>> going
> >>>>>>>>> through the auth manager (I'll explain why below):
> >>>>>>>>>
> >>>>>>>>> variable_key -> Variable.get_team_name(key) -> resource_team
> >>>>>>>>> token.id -> get_team_name_dep() -> task_team
> >>>>>>>>> Deny if resource_team != task_team (when both are non-None)
> >>>>>>>>>
> >>>>>>>>> When core.multi_team is disabled, get_team_name_dep returns
> >>>> None and
> >>>>>>>>> the check is skipped, so current single-team behavior stays
> >>>> exactly
> >>>>>>>>> the same.
> >>>>>>>>>
> >>>>>>>>> 2) XCom authorization
> >>>>>>>>>
> >>>>>>>>> This is the harder part. For writes, I think we should verify
> >>>> the
> >>>>>>>>> calling task is writing its own XComs -- the task identity from
> >>>> the
> >>>>>>>>> JWT should match the dag_id/task_id in the URL path. For reads,
> >>>>>>>>> enforce team boundary so a task can only read XComs from tasks
> >>>> within
> >>>>>>>>> the same team. This would allow cross-DAG xcom_pull within a
> >>>> team
> >>>>>>>>> (which people already do) while preventing cross-team access.
> >>>>>>>>>
> >>>>>>>>> To avoid a DB lookup on every request, I'd propose adding
> >>>> dag_id to
> >>>>>>>>> the JWT claims at generation time. The dag_id is already on the
> >>>>>>>>> TaskInstance schema in ExecuteTask.make() (workloads.py:142).
> >>>> The
> >>>>>>>>> JWTReissueMiddleware already preserves all claims during token
> >>>>>>>>> refresh, so this wouldn't break anything. Adding task_id and
> >>>> run_id
> >>>>>> to
> >>>>>>>>> the token could be done as a follow-up -- there's a TODO at
> >>>>>>>>> xcoms.py:315 about eventually deriving these from the token
> >>>> instead
> >>>>>> of
> >>>>>>>>> the URL.
> >>>>>>>>>
> >>>>>>>>> I'm not proposing to add team_name to the token. It's not
> >>>> available
> >>>>>> on
> >>>>>>>>> the TaskInstance schema at generation time. Resolving it
> >>>> requires a
> >>>>>> DB
> >>>>>>>>> join through DagModel -> DagBundleModel -> Team, which would
> >>>> slow
> >>>>>> down
> >>>>>>>>> the scheduler's task queuing path. Better to resolve it at
> >>>> request
> >>>>>>>>> time via get_team_name_dep.
> >>>>>>>>>
> >>>>>>>>> Why not go through BaseAuthManager?
> >>>>>>>>>
> >>>>>>>>> One design question I want to raise: the Execution API auth
> >>>> stubs
> >>>>>>>>> currently don't call BaseAuthManager.is_authorized_*(), and I
> >>>> think
> >>>>>>>>> they probably shouldn't. The BaseAuthManager interface is
> >>>> designed
> >>>>>>>>> around human identity (BaseUser with roles and team
> >>>> memberships), but
> >>>>>>>>> the Execution API operates on task identity (TIToken with a
> >>>> UUID).
> >>>>>>>>> These are very different things. A task doesn't have a "role"
> >>>> in the
> >>>>>>>>> RBAC sense, it has a team derived from its DAG's bundle.
> >>>>>>>>>
> >>>>>>>>> I'm leaning toward keeping the authorization logic directly in
> >>>> the
> >>>>>>>>> has_*_access dependency functions, using get_team_name_dep for
> >>>> team
> >>>>>>>>> resolution. This keeps the Execution API auth simple and avoids
> >>>> tying
> >>>>>>>>> task authorization to the human auth manager. But I'd like to
> >>>> hear if
> >>>>>>>>> others think we should instead extend BaseAuthManager with
> >>>>>>>>> task-identity-aware methods.
> >>>>>>>>>
> >>>>>>>>> What about single-team deployments?
> >>>>>>>>>
> >>>>>>>>> When core.multi_team=False (the default for most deployments),
> >>>> the
> >>>>>>>>> team boundary checks would be skipped entirely for variables and
> >>>>>>>>> connections. For XComs, I think write ownership verification
> >>>> (task
> >>>>>> can
> >>>>>>>>> only write its own XComs) is worth keeping regardless of
> >>>> multi-team
> >>>>>>>>> mode -- it's more of a correctness concern than an
> >>>> authorization one.
> >>>>>>>>> But I can also see the argument for a complete no-op when
> >>>> multi_team
> >>>>>>>>> is off to keep things simple.
> >>>>>>>>>
> >>>>>>>>> Out of scope:
> >>>>>>>>>
> >>>>>>>>> AIP-72 [3] mentions three possible authorization models:
> >>>>>>>>> pre-declaration (DAGs declare required resources), runtime
> >>>> request
> >>>>>>>>> with deployment-level policy, and OPA integration via WASM
> >>>> bindings.
> >>>>>>>>> I'm not trying to address any of those here. The team-boundary
> >>>>>>>>> enforcement is the base that all three future models need.
> >>>>>>>>>
> >>>>>>>>> Implementation plan:
> >>>>>>>>>
> >>>>>>>>> 1. Add dag_id claim to JWT token generation in workloads.py
> >>>>>>>>> 2. Implement has_variable_access team boundary check
> >>>>>>>>> 3. Implement has_connection_access team boundary check
> >>>>>>>>> 4. Implement has_xcom_access with write ownership + team
> >>>> boundary
> >>>>>>>>> 5. Add XCom team resolution (XCom routes currently have no
> >>>>>>>>> get_team_name_dep usage)
> >>>>>>>>> 6. Tests for all authorization scenarios including cross-team
> >>>> denial
> >>>>>>>>> 7. Documentation update for multi-team authorization behavior
> >>>>>>>>>
> >>>>>>>>> This should be a fairly small change -- mostly filling in the
> >>>>>> existing
> >>>>>>>>> stubs with actual checks.
> >>>>>>>>>
> >>>>>>>>> Let me know what you think.
> >>>>>>>>>
> >>>>>>>>> Anish
> >>>>>>>>>
> >>>>>>>>> [1]
> >>>>>>>
> >>>>
> https://github.com/apache/airflow/issues/60125#issuecomment-3712218766
> >>>>>>>>> [2]
> >>>>>>>
> >>>>>>
> >>>>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components
> >>>>>>>>> [3]
> >>>>>>>
> >>>>>>
> >>>>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-72+Task+Execution+Interface+aka+Task+SDK
> >>>>>>>>> [4] https://github.com/apache/airflow/pull/58905
> >>>>>>>>> [5] https://github.com/apache/airflow/pull/59476
> >>>>>>>>>
> >>>>>>>>>
> >>>> ---------------------------------------------------------------------
> >>>>>>>>> To unsubscribe, e-mail: [email protected]
> >>>>>>>>> For additional commands, e-mail: [email protected]
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail: [email protected]
> >>>>>>>> For additional commands, e-mail: [email protected]
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail: [email protected]
> >>>>>>> For additional commands, e-mail: [email protected]
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: [email protected]
> >>>> For additional commands, e-mail: [email protected]
> >>>>
> >>>>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
>

Re: [DISCUSS] Execution API access control for multi-team (AIP-67)

Reply via email to