Re: [DISCUSS] Execution API access control for multi-team (AIP-67)

Anish Giri Thu, 26 Feb 2026 07:33:06 -0800

Thank you so much, Ash and Jarek, for the clarifications. I learned a
lot from these responses.


On Thu, Feb 26, 2026 at 8:40 AM Jarek Potiuk <[email protected]> wrote:
>
> Good stuff Ash. Thanks for all those well-thought-out items :).
>
> If we can make it, I think we can have a bit more discussion at the devlist
> today (I added a topic for that)
> Let me add a few comments; that might lead to a few points we can cover
> during the call.
>
>
> > No, the supervisor does nothing with those secrets. It does not generate
> > any tokens. The token is generated on the executor and set to the worker
> > via some mechanism (outside of the Task Execution API. That’s the executors
> > responsibility.)
> >
>
> Good. We received a few reports about it from security researchers (we
> discarded them as invalid because, until now, the JWT_TOKEN "scope" did not
> matter). Consequently, it's not clearly documented, and we are about to
> start making some claims here.
>
>
> > > If the API server were the sole
> > > token issuer, the scheduler would dispatch tasks with just the task
> > > identity, no token, no signing key
> >
> >
> > You can (and probably should) run the Workers _without_ that setting set.
> > This is a bit of a departure from the (unwritten?) rule that all Airflow
> > components should have the same config. I will propose this change in our
> > docs. Or if someone wants to do that I’ll approve it.
> >
>
> Unfortunately, it is well-documented:
> https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html
>
> We  even explain that even if some configuration values refer to other
> components, they should be configured the same for both the worker and the
> web server. We also provide an example of an API server secret key, We do
> not have a clear explamation which configuration values should be set for
> which components  - just "configuraion group" is not enough as explained in
> the example given: Here is the paragraph:
>
> *> Use the same configuration across all the Airflow components. While each
> component does not require all, some configurations need to be same
> otherwise they would not work as expected. A good example for that is
> secret_key which should be same on the Webserver and Worker to allow
> Webserver to fetch logs from Worker.*
>
> Also, I believe it's more than just documentation. We also have:
>
> * As described in
> https://airflow.apache.org/docs/apache-airflow/stable/howto/set-config.html#setting-configuration-options,
> the "airflow config --defaults" tool produces example configuration for
> airflow (a single file containing only the necessary configurations,
> without separate configurations for each component). We should likely have
> a way to use separate configuration files for separate components. Even
> better - I think - we should have a "secure" version that does not produce
> defaults for sensitive configurations. I think we must force users to pass
> these via environment variables (see my comments about PR_SET_DUMPABLE
> below).
>
> * docker compose we release where not only a single "airflow.cfg" is shared
> between different components, but also AIRFLOW__API_AUTH__JWT_SECRET is in
> `<<: *airflow-common` section that is shared by all components
>
> * Helm chart where "airflow_config_mount" has the same "airflow.cfg" file
> is shared between all components.
>
> IMHO, if we want to introduce **new** approach where there should be
> different configuration parameters set for different components, all those
> places should cover this and provide a way to set different values (or
> explicitly document that they are not implementing the isolation
> properties). This might actually be simpler - almost no work needed except
> having only non-sensitive config  - if we implement a "secure" running mode
> where sensitive "configuration" is only available via env vars (see below)
>
> Also it does not really solve a problem of DagFileProcessor and Triggerer.
> Both our processes and user processes currently access the same
> configuration files (for example, the DB configuration string), allowing
> the user code to do "anything".  If we want to promise certain levels of
> isolation (which we have done partially but not precisely so far), we need
> a solution for running user code in both DagFileProcessor child processes
> and the triggerer's async loop.
>
>
> > Tasks need to be able to operate only being _pushed_ information.
> >
> > Also there is https://github.com/apache/airflow/pull/60108 in progress
> > which along a change I’m working on will mean that the token that is being
> > sent along with a workload can only be used to call the /run endpoint once,
> > which helps a lot here.
> >
> >
> Yep. That's a nice improvement. However, each executor should have a
> somewhat secure way to pass the original token and clearly state its
> security "level" in this context (For example, the LocalExecutor provides
> no isolation or security as you noted).
>
>
> > I’m looking at if we can drop capabilities in the supervisor, such that
> > the forked user-code process is then also unable to read memory of another
> > process even of the same user. This change, when coupled with
> > removing/documentig/warning that jwt_secret should not be on workes would I
> > think make this entirely secure.
> >
> > Not quite the capabilities subsystem, but there is a prctl syscall and the
> > PR_SET_DUMPABLE flag
> > https://man7.org/linux/man-pages/man2/PR_SET_DUMPABLE.2const.html  we can
> > use - see some simple testing
> > https://gist.github.com/ashb/5d62f244b837cb4052743318eb18fdc6 - PR
> > incoming today. I’ll make sure to include the Workers (i.e. celery
> > processes, as JWT transits through those as well)
> >
>
> Yep - good idea. That's one of the options I looked up (with suid and
> cgroups). It seems to add a good layer of defense against reading memory
> from the parent process (as long as we do it early enough so sensitive data
> isn't already in memory before forking—if the data is already there, the
> child process will access it). The celery process forking dance should
> ensure the celery "master" process that forks the supervisors never sees
> any sensitive credentials..
>
> Still I think that alone does not prevent the DagFileProcessor parsing
> processes and Triggerer async coroutines from accessing the configuration
> the main process ran with if the values are configured in the configuration
> files. My thinking so far was that we could use `PR_SET_DUMPABLE` similarly
> for both the Triggerer and DagFileProcessor, but any sensitive data should
> **only** ever be set via environment variables - because PR_SET_DUMPABLE
> also protects /proc/pid/environ in the same way it protects /proc/pid/mem
> (but it does not prevent reading from configuration files).
>
> I believe we could enforce that if we ensure no "non-shareable" sensitive
> configuration variable is ever read from the config file. I would say that
> we might need a "security_isolation" flag or similar that will, for
> example, prevent reading such sensitive information from config files (and
> even actively fail when one is found). Using PR_SET_DUMPABLE for all
> components needing isolation might be a very secure solution. It could also
> perform other checks and fail to start a component if any security-related
> checks fail (for example, permissions for the home directory are too
> open)—similar to what `ssh` does when it refuses keys with overly broad
> permissions. It is also deployment-independent, which is very cool, because
> we do not want to "force" users to configure sudo-allowed UNIX users, for
> example, or generally define several UNIX users (as was the case with
> impersonation).
>
> I think applying all of that brings us very close to pretty "safe"
> isolation feature.
>
>
> > ---
> > Claims
> >
> >
> > So I’m not sure we _need_ to include the dag/task/run id etc in the claim,
> > as the UUID already uniquely identifies the TI row, and we need to fetch
> > that on every API access to validate that the TI is actually still in a
> > “valid” state to be able to speak to the Execution API. Specifically that
> > it is still the latest TI try (because if it’s not the latest attempt, the
> > UUID will only be found in the ti history table, and that can't be running
> > so this task shouldn’t be able to execute any API endpoints.) Given that,
> > we can keep the tokens shorter and make the loaded TI object available in
> > the request context.
> >
>
> I don't think we need to do it indeed, if we will always make such check -
> the question is whether adding for example team_id would allow to avoid
> some join queries. This is likely a question for Vincent and Nicolas.
>
>
> >
> > ---
> > LocalExecutor
> >
> > Honestly, there’s not _much_ protection you can do when running things
> > with the local executor. I’d say we document “if you care about this
> > pattern of protection, do not use local exec”. (Because the secrets are
> > almost certainly readable on disk)
> >
>
> Oh absolutely. We should make sure that this is well documented.
>
>
> >
> >
> > > On 25 Feb 2026, at 06:33, Anish Giri <[email protected]> wrote:
> > >
> > > Hi Jarek,
> > >
> > > As we are waiting for Ash/Kaxil/Amogh's input, I tried to trace the
> > > token flow through the codebase. I would appreciate your thoughts on
> > > it.
> > >
> > > Your point about the forked task always being able to extract the
> > > signing key made me rethink the whole approach. I was wondering if the
> > > signing key actually need to be in the scheduler/worker process at
> > > all?
> > >
> > > Right now the scheduler loads the signing key at startup and generates
> > > the tokens in its scheduling loop. If the API server were the sole
> > > token issuer, the scheduler would dispatch tasks with just the task
> > > identity, no token, no signing key. The worker would request a scoped
> > > execution token from the API server before calling `start()`. Nothing
> > > for a forked task to extract. Some of the foundation for this exists
> > > in the scope based token work I'm doing in PR #60108. This would fully
> > > cover distributed deployments (Celery, Kubernetes) where the task has
> > > no path to the API server's memory.
> > >
> > > For co located deployments (LocalExecutor), a task running as the same
> > > Unix user could still reach the API server's memory via `/proc`. But
> > > if the API server registers every token's JTI at issuance and rejects
> > > unregistered JTIs, a forged token gets rejected even with a valid
> > > signature, because its JTI was never issued. The infrastructure for
> > > this seems straightforward and a table similar to `revoked_token` from
> > > #61339, with the same JTI lookup pattern but with the inverted logic.
> > >
> > > I think so that the combination would cover all deployment topologies.
> > > Please correct me if I am wrong. OS-level hardening could still be
> > > recommended, but I think it wouldn't be required.
> > >
> > > I might be missing something obvious. I would love to hear if there's
> > > a flaw in this reasoning, or if the original authors had a different
> > > approach in mind.
> > >
> > > Anish
> > >
> > > On Fri, Feb 20, 2026 at 6:58 PM Jarek Potiuk <[email protected]> wrote:
> > >>
> > >>> You mentioned having some ideas on the cryptographically strong
> > >> provenance side and I would really like to hear them.
> > >>
> > >> I would first like to hear the original thinking - from Ash, Kaxil,
> > Amogh
> > >> - I do not want to introduce too much of a complexity, because maybe I
> > am -
> > >> indeed overcomplicating it, maybe there are some ways that were
> > discussed
> > >> before to protect the secret key and JWT signing process.
> > >>
> > >> So far my ideas are pretty complicated and generally involve instructing
> > >> users to do **a number extra** things in their deployment process that
> > are
> > >> far beyond installing the app and beyond "python" realm, but I might be
> > >> missing something obvious.
> > >>
> > >> J.
> > >>
> > >>
> > >> On Sat, Feb 21, 2026 at 1:52 AM Jarek Potiuk <[email protected]> wrote:
> > >>
> > >>> It does not change much (and is not good for performance). The "spawn"
> > >>> suffers from the same "having access to the configuration that the
> > >>> supervisor has". If the supervisor can read all configuration needed
> > to get
> > >>> JWT_secret then the process spawned from it can just repeat the same
> > steps
> > >>> that the supervisor process did to obtain the JWT_secret, and create
> > >>> JWT_token with any claims. Also such spawned processes can dump memory
> > of
> > >>> the parent process via `/proc/<pid>/mem` if they are run with the same
> > >>> user. Or use `gcore PID` to dump memory of the process to a file.
> > >>>
> > >>> This is controlled by "ptrace" permission that is generally enabled on
> > all
> > >>> Linux systems by default (in order to enable debugging - for example
> > gdb
> > >>> attaching to a running process or dumping core with gcore). You can
> > disable
> > >>> this permission by SELinux or Yama Linux Security Module. And even that
> > >>> does not restrict the capability of a spawned process to just read the
> > same
> > >>> configuration files or environment variables that the main process had
> > and
> > >>> re-create JWT-token with any claims.
> > >>>
> > >>> It's just how unix user process separation works - any process of a
> > UNIX
> > >>> user by default can do **anything** with any other processes of the
> > same
> > >>> UNIX user.
> > >>>
> > >>> J.
> > >>>
> > >>>
> > >>> On Sat, Feb 21, 2026 at 1:27 AM Anish Giri <[email protected]>
> > >>> wrote:
> > >>>
> > >>>> Hi Jarek, Vikram,
> > >>>>
> > >>>> Thanks for this, and I am really very glad that I posted it before
> > >>>> writing any code.
> > >>>>
> > >>>> I spent some time going through your point about the fork model and
> > >>>> the signing key. That's something I hadn't considered at all. I went
> > >>>> and looked now at how the key flows through the code, and you're right
> > >>>> that with fork the scheduler's heap gets inherited via copy on write,
> > >>>> so the key material ends up in the worker's address space even though
> > >>>> it is never explicitly passed. The task code runs in a second fork
> > >>>> inside the supervisor, so it inherits the same memory. So the identity
> > >>>> model isn't secure in the fork model, no matter what we build on top
> > of
> > >>>> it, anyway.
> > >>>>
> > >>>> There is one thing I was wondering, and please correct me if I am
> > >>>> wrong, would switching from **fork** to **spawn** for its worker
> > >>>> processes help here? Spawned workers start with a clean interpreter,
> > so
> > >>>> the signing key never gets to enter their address space. And since the
> > >>>> supervisor's fork inherits from the worker (which never had the key),
> > >>>> the task would not have it either now.
> > >>>>
> > >>>> Not sure if I'm oversimplifying it though. You mentioned having some
> > >>>> ideas on the cryptographically strong provenance side and I would
> > >>>> really like to hear them.
> > >>>>
> > >>>> Anish
> > >>>>
> > >>>> On Fri, Feb 20, 2026 at 3:32 PM Vikram Koka via dev
> > >>>> <[email protected]> wrote:
> > >>>>>
> > >>>>> +1 to Jarek's comments and questions here.
> > >>>>>
> > >>>>> I am concerned that these proposed changes at the PR level could
> > create
> > >>>> an
> > >>>>> illusion of security, potentially leading to many "security bugs"
> > >>>> reported
> > >>>>> by users who may have a very different expectation.
> > >>>>>
> > >>>>> We need to clearly articulate a set of security expectations here
> > before
> > >>>>> addressing this in a set of PRs.
> > >>>>>
> > >>>>> Vikram
> > >>>>>
> > >>>>> On Fri, Feb 20, 2026 at 1:23 PM Jarek Potiuk <[email protected]>
> > wrote:
> > >>>>>
> > >>>>>> I think there is one more thing that I've been mentioning some time
> > >>>> ago and
> > >>>>>> it's time to put in more concrete words.
> > >>>>>>
> > >>>>>> Currently there is **no** protection against the tasks making claims
> > >>>> that
> > >>>>>> they belong to other tasks. While the running task by default
> > >>>> receives the
> > >>>>>> generated token to use from the supervisor - there is absolutely no
> > >>>> problem
> > >>>>>> for the forked task to inspect parent process memory to get the
> > >>>>>> "supervisor" token that is used to sign the "task" token and
> > generate
> > >>>> a new
> > >>>>>> token with **any"" dag_id or task_id or basically any other claim.
> > >>>>>>
> > >>>>>> This is by design currently, because we do not have any control
> > >>>> implemented
> > >>>>>> and part of the security model of Airflow 3.0 - 3.1 is that any task
> > >>>> can
> > >>>>>> perform **any** action on task SDK and we never even attempt to
> > verify
> > >>>>>> which tasks, dags state it can modify, which connections or
> > variables
> > >>>> it
> > >>>>>> accesses. We only need to know that this "task" was authorised by
> > the
> > >>>>>> scheduler to call "task-sdk" API.
> > >>>>>>
> > >>>>>> With multi-team, this assumption is broken. We **need** to know and
> > >>>>>> **enforce** task_id provenance. The situation when one task pretends
> > >>>> to be
> > >>>>>> another task is not acceptable any more - and violates basic
> > isolation
> > >>>>>> between the teams.
> > >>>>>>
> > >>>>>> As I understand the way how current supervisor-> task JWT token
> > >>>> generation
> > >>>>>> works is (and please correct me if I am wrong):
> > >>>>>>
> > >>>>>> * when supervisor starts it reads configuration of ("jwt_secret" /
> > >>>>>> "jwt_private_key_path" / "jwt_kid")
> > >>>>>> * when it starts a task, it uses this "secret" to generate a
> > >>>> JWT_token for
> > >>>>>> that task (with "dag_id", "dag_run_id", "task_instance_id") claims -
> > >>>> and it
> > >>>>>> is used by supervisor to communicate with api_server
> > >>>>>> * forked task does not have direct reference to that token nor to
> > the
> > >>>>>> jwt_secret when started - it does not get it passed
> > >>>>>> * executing task process is only supposed to communicate with the
> > >>>>>> supervisor via in-process communication, it does not open connection
> > >>>> nor
> > >>>>>> use the JWT_token directly
> > >>>>>>
> > >>>>>> Now ... the interesting thing is that while the forked process does
> > >>>> not
> > >>>>>> have an "easy" API to not only get the token and use it directly,
> > but
> > >>>> also
> > >>>>>> to generate NEW token because no matter how hard we try, the forked
> > >>>> task
> > >>>>>> will **always** be able to access "jwt_secret" and create its own
> > >>>> JWT_token
> > >>>>>> - and add **ANY** claims to that token. That's simply a consequence
> > of
> > >>>>>> using our fork model, also additional thing is that (default)
> > >>>> approach of
> > >>>>>> using the same unix user in the forked process, enables the forked
> > >>>> process
> > >>>>>> to read **any** of the information that supervisor process accesses
> > >>>>>> (including configuration files, env variables and even memory of the
> > >>>>>> supervisor process).
> > >>>>>>
> > >>>>>> There are two ways how running task can get JWT_SECRET:
> > >>>>>>
> > >>>>>> * since the task process is forked from the supervisor - everything
> > >>>> that
> > >>>>>> parent process has in memory - even if the method executed in the
> > >>>> fork has
> > >>>>>> no direct reference to it. The forked process can use "globals" and
> > >>>> get to
> > >>>>>> any variable, function, class, method that the parent supervisor
> > >>>> process
> > >>>>>> has. It can read any data in memory of the process. So if the
> > >>>> JWT_Secret is
> > >>>>>> already in memory of the parent process when the task process is
> > >>>> forked, it
> > >>>>>> also is in memory of the task process
> > >>>>>>
> > >>>>>> * since the task process is the same unix user as the parent process
> > >>>> - it
> > >>>>>> has access to all the same configuration, environment data. Even if
> > >>>> the
> > >>>>>> parent process will clear os.environ - the child process can read
> > >>>> original
> > >>>>>> environment vairables the parent process has been started with using
> > >>>>>> `/proc` filesystem (it just needs to know the parent process id -
> > >>>> which it
> > >>>>>> always has). Unless more sophisticated mechanism are used such
> > SELinux
> > >>>>>> (requires kernel with SELinux and configured system-level SELinux
> > >>>> rules) ,
> > >>>>>> user impersonation and cgroups/proper access control to files
> > >>>> (requires
> > >>>>>> sudo access for parent process id)  - such forked process can do
> > >>>>>> **everything** the parent process can do - including reading the
> > >>>>>> configuration of JWT_secret and creating JWT_tokens with (again) any
> > >>>>>> task_instance_id, any dag id, and dag_run id claim.
> > >>>>>>
> > >>>>>> So no matter what we do on the "server" side - the client side
> > >>>> (supervisor)
> > >>>>>> - in the default configuration already allows the task to pretend
> > >>>> they are
> > >>>>>> "whatever dag id" - in which case server side verification is
> > >>>> pointless.
> > >>>>>>
> > >>>>>> I believe (Ashb? Kaxil? Amogh?) that was a deliberate decision when
> > >>>> the API
> > >>>>>> was designed when the Task SDK / JWT token for Airflow 3.0 was
> > >>>> implemented
> > >>>>>> (because we did not need it).
> > >>>>>>
> > >>>>>> I would love to hear if my thinking is wrong, but I highly doubt it,
> > >>>> so I
> > >>>>>> wonder what were the original thoughts here on how the task identity
> > >>>> can
> > >>>>>> have "cryptographically strong" provenance ?  I have some ideas for
> > >>>> that,
> > >>>>>> but I would love to hear what the original author's thoughts are ?
> > >>>>>>
> > >>>>>> J.
> > >>>>>>
> > >>>>>> On Fri, Feb 20, 2026 at 8:49 PM Anish Giri <
> > [email protected]>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Thanks, Vincent! I appreciate your review. I'll get started on the
> > >>>>>>> implementation and tag you on the PRs.
> > >>>>>>>
> > >>>>>>> Anish
> > >>>>>>>
> > >>>>>>> On Fri, Feb 20, 2026 at 8:23 AM Vincent Beck <[email protected]>
> > >>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>> Hey Anish,
> > >>>>>>>>
> > >>>>>>>> Everything you said makes sense to me, I might have questions on
> > >>>>>>> specifics but I rather keep them for PRs, that'll make everything
> > >>>> way
> > >>>>>>> easier.
> > >>>>>>>>
> > >>>>>>>> Feel free to ping me on all your PRs,
> > >>>>>>>> Vincent
> > >>>>>>>>
> > >>>>>>>> On 2026/02/20 07:34:47 Anish Giri wrote:
> > >>>>>>>>> Hello everyone,
> > >>>>>>>>>
> > >>>>>>>>> Jarek asked for a proposal on #60125 [1] before implementing
> > >>>> access
> > >>>>>>>>> control for the Execution API's resource endpoints (variables,
> > >>>>>>>>> connections, XComs), so here it is.
> > >>>>>>>>>
> > >>>>>>>>> After going through the codebase, I think this is really about
> > >>>>>>>>> completing AIP-67's [2] multi-team boundary enforcement rather
> > >>>> than
> > >>>>>>>>> introducing a new security model. Most of the infrastructure
> > >>>> already
> > >>>>>>>>> exists. What's missing are the actual authorization checks.
> > >>>>>>>>>
> > >>>>>>>>> The current state:
> > >>>>>>>>>
> > >>>>>>>>> The Execution API has three authorization stubs that always
> > >>>> return
> > >>>>>>> True:
> > >>>>>>>>>
> > >>>>>>>>> - has_variable_access() in execution_api/routes/variables.py
> > >>>>>>>>> - has_connection_access() in execution_api/routes/connections.py
> > >>>>>>>>> - has_xcom_access() in execution_api/routes/xcoms.py
> > >>>>>>>>>
> > >>>>>>>>> All three have a "# TODO: Placeholder for actual implementation"
> > >>>>>>> comment.
> > >>>>>>>>>
> > >>>>>>>>> For variables and connections, vincbeck's data-layer team
> > >>>> scoping
> > >>>>>>>>> (#58905 [4], #59476 [5]) already prevents cross-team data
> > >>>> retrieval
> > >>>>>> in
> > >>>>>>>>> practice. A cross-team request returns a 404 rather than the
> > >>>>>> resource.
> > >>>>>>>>> So the data isolation is there, but the auth stubs don't reject
> > >>>> these
> > >>>>>>>>> requests early with a proper 403, and there's no second layer of
> > >>>>>>>>> protection at the auth check itself.
> > >>>>>>>>>
> > >>>>>>>>> For XComs, the situation is different. There is no isolation at
> > >>>> any
> > >>>>>>>>> layer. XCom routes take dag_id, run_id, and task_id directly
> > >>>> from URL
> > >>>>>>>>> path parameters with no validation against the calling task's
> > >>>>>>>>> identity. A task in Team-A's bundle can currently read and write
> > >>>>>>>>> Team-B's XComs.
> > >>>>>>>>>
> > >>>>>>>>> There's already a get_team_name_dep() function in deps.py that
> > >>>>>>>>> resolves a task's team via TaskInstance -> DagModel ->
> > >>>> DagBundleModel
> > >>>>>>>>> -> Team in a single join query. The variable and connection
> > >>>> endpoints
> > >>>>>>>>> already use it. XCom routes don't use it at all.
> > >>>>>>>>>
> > >>>>>>>>> Proposed approach:
> > >>>>>>>>>
> > >>>>>>>>> I'm thinking of this in two parts:
> > >>>>>>>>>
> > >>>>>>>>> 1) Team boundary checks for variables and connections
> > >>>>>>>>>
> > >>>>>>>>> Fill the auth stubs with team boundary checks. For reference,
> > >>>> the
> > >>>>>> Core
> > >>>>>>>>> API does this in security.py. requires_access_variable()
> > >>>> resolves the
> > >>>>>>>>> resource's team via Variable.get_team_name(key), wraps it in
> > >>>>>>>>> VariableDetails, and passes it to
> > >>>>>>>>> auth_manager.is_authorized_variable(method, details, user). The
> > >>>> auth
> > >>>>>>>>> manager then checks team membership.
> > >>>>>>>>>
> > >>>>>>>>> For the Execution API, the flow would be similar but without
> > >>>> going
> > >>>>>>>>> through the auth manager (I'll explain why below):
> > >>>>>>>>>
> > >>>>>>>>> variable_key -> Variable.get_team_name(key) -> resource_team
> > >>>>>>>>> token.id -> get_team_name_dep() -> task_team
> > >>>>>>>>> Deny if resource_team != task_team (when both are non-None)
> > >>>>>>>>>
> > >>>>>>>>> When core.multi_team is disabled, get_team_name_dep returns
> > >>>> None and
> > >>>>>>>>> the check is skipped, so current single-team behavior stays
> > >>>> exactly
> > >>>>>>>>> the same.
> > >>>>>>>>>
> > >>>>>>>>> 2) XCom authorization
> > >>>>>>>>>
> > >>>>>>>>> This is the harder part. For writes, I think we should verify
> > >>>> the
> > >>>>>>>>> calling task is writing its own XComs -- the task identity from
> > >>>> the
> > >>>>>>>>> JWT should match the dag_id/task_id in the URL path. For reads,
> > >>>>>>>>> enforce team boundary so a task can only read XComs from tasks
> > >>>> within
> > >>>>>>>>> the same team. This would allow cross-DAG xcom_pull within a
> > >>>> team
> > >>>>>>>>> (which people already do) while preventing cross-team access.
> > >>>>>>>>>
> > >>>>>>>>> To avoid a DB lookup on every request, I'd propose adding
> > >>>> dag_id to
> > >>>>>>>>> the JWT claims at generation time. The dag_id is already on the
> > >>>>>>>>> TaskInstance schema in ExecuteTask.make() (workloads.py:142).
> > >>>> The
> > >>>>>>>>> JWTReissueMiddleware already preserves all claims during token
> > >>>>>>>>> refresh, so this wouldn't break anything. Adding task_id and
> > >>>> run_id
> > >>>>>> to
> > >>>>>>>>> the token could be done as a follow-up -- there's a TODO at
> > >>>>>>>>> xcoms.py:315 about eventually deriving these from the token
> > >>>> instead
> > >>>>>> of
> > >>>>>>>>> the URL.
> > >>>>>>>>>
> > >>>>>>>>> I'm not proposing to add team_name to the token. It's not
> > >>>> available
> > >>>>>> on
> > >>>>>>>>> the TaskInstance schema at generation time. Resolving it
> > >>>> requires a
> > >>>>>> DB
> > >>>>>>>>> join through DagModel -> DagBundleModel -> Team, which would
> > >>>> slow
> > >>>>>> down
> > >>>>>>>>> the scheduler's task queuing path. Better to resolve it at
> > >>>> request
> > >>>>>>>>> time via get_team_name_dep.
> > >>>>>>>>>
> > >>>>>>>>> Why not go through BaseAuthManager?
> > >>>>>>>>>
> > >>>>>>>>> One design question I want to raise: the Execution API auth
> > >>>> stubs
> > >>>>>>>>> currently don't call BaseAuthManager.is_authorized_*(), and I
> > >>>> think
> > >>>>>>>>> they probably shouldn't. The BaseAuthManager interface is
> > >>>> designed
> > >>>>>>>>> around human identity (BaseUser with roles and team
> > >>>> memberships), but
> > >>>>>>>>> the Execution API operates on task identity (TIToken with a
> > >>>> UUID).
> > >>>>>>>>> These are very different things. A task doesn't have a "role"
> > >>>> in the
> > >>>>>>>>> RBAC sense, it has a team derived from its DAG's bundle.
> > >>>>>>>>>
> > >>>>>>>>> I'm leaning toward keeping the authorization logic directly in
> > >>>> the
> > >>>>>>>>> has_*_access dependency functions, using get_team_name_dep for
> > >>>> team
> > >>>>>>>>> resolution. This keeps the Execution API auth simple and avoids
> > >>>> tying
> > >>>>>>>>> task authorization to the human auth manager. But I'd like to
> > >>>> hear if
> > >>>>>>>>> others think we should instead extend BaseAuthManager with
> > >>>>>>>>> task-identity-aware methods.
> > >>>>>>>>>
> > >>>>>>>>> What about single-team deployments?
> > >>>>>>>>>
> > >>>>>>>>> When core.multi_team=False (the default for most deployments),
> > >>>> the
> > >>>>>>>>> team boundary checks would be skipped entirely for variables and
> > >>>>>>>>> connections. For XComs, I think write ownership verification
> > >>>> (task
> > >>>>>> can
> > >>>>>>>>> only write its own XComs) is worth keeping regardless of
> > >>>> multi-team
> > >>>>>>>>> mode -- it's more of a correctness concern than an
> > >>>> authorization one.
> > >>>>>>>>> But I can also see the argument for a complete no-op when
> > >>>> multi_team
> > >>>>>>>>> is off to keep things simple.
> > >>>>>>>>>
> > >>>>>>>>> Out of scope:
> > >>>>>>>>>
> > >>>>>>>>> AIP-72 [3] mentions three possible authorization models:
> > >>>>>>>>> pre-declaration (DAGs declare required resources), runtime
> > >>>> request
> > >>>>>>>>> with deployment-level policy, and OPA integration via WASM
> > >>>> bindings.
> > >>>>>>>>> I'm not trying to address any of those here. The team-boundary
> > >>>>>>>>> enforcement is the base that all three future models need.
> > >>>>>>>>>
> > >>>>>>>>> Implementation plan:
> > >>>>>>>>>
> > >>>>>>>>> 1. Add dag_id claim to JWT token generation in workloads.py
> > >>>>>>>>> 2. Implement has_variable_access team boundary check
> > >>>>>>>>> 3. Implement has_connection_access team boundary check
> > >>>>>>>>> 4. Implement has_xcom_access with write ownership + team
> > >>>> boundary
> > >>>>>>>>> 5. Add XCom team resolution (XCom routes currently have no
> > >>>>>>>>> get_team_name_dep usage)
> > >>>>>>>>> 6. Tests for all authorization scenarios including cross-team
> > >>>> denial
> > >>>>>>>>> 7. Documentation update for multi-team authorization behavior
> > >>>>>>>>>
> > >>>>>>>>> This should be a fairly small change -- mostly filling in the
> > >>>>>> existing
> > >>>>>>>>> stubs with actual checks.
> > >>>>>>>>>
> > >>>>>>>>> Let me know what you think.
> > >>>>>>>>>
> > >>>>>>>>> Anish
> > >>>>>>>>>
> > >>>>>>>>> [1]
> > >>>>>>>
> > >>>>
> > https://github.com/apache/airflow/issues/60125#issuecomment-3712218766
> > >>>>>>>>> [2]
> > >>>>>>>
> > >>>>>>
> > >>>>
> > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components
> > >>>>>>>>> [3]
> > >>>>>>>
> > >>>>>>
> > >>>>
> > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-72+Task+Execution+Interface+aka+Task+SDK
> > >>>>>>>>> [4] https://github.com/apache/airflow/pull/58905
> > >>>>>>>>> [5] https://github.com/apache/airflow/pull/59476
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>> ---------------------------------------------------------------------
> > >>>>>>>>> To unsubscribe, e-mail: [email protected]
> > >>>>>>>>> For additional commands, e-mail: [email protected]
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>> ---------------------------------------------------------------------
> > >>>>>>>> To unsubscribe, e-mail: [email protected]
> > >>>>>>>> For additional commands, e-mail: [email protected]
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>> ---------------------------------------------------------------------
> > >>>>>>> To unsubscribe, e-mail: [email protected]
> > >>>>>>> For additional commands, e-mail: [email protected]
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>
> > >>>> ---------------------------------------------------------------------
> > >>>> To unsubscribe, e-mail: [email protected]
> > >>>> For additional commands, e-mail: [email protected]
> > >>>>
> > >>>>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [DISCUSS] Execution API access control for multi-team (AIP-67)

Reply via email to