Hi Jarek, Vikram, Thanks for this, and I am really very glad that I posted it before writing any code.
I spent some time going through your point about the fork model and the signing key. That's something I hadn't considered at all. I went and looked now at how the key flows through the code, and you're right that with fork the scheduler's heap gets inherited via copy on write, so the key material ends up in the worker's address space even though it is never explicitly passed. The task code runs in a second fork inside the supervisor, so it inherits the same memory. So the identity model isn't secure in the fork model, no matter what we build on top of it, anyway. There is one thing I was wondering, and please correct me if I am wrong, would switching from **fork** to **spawn** for its worker processes help here? Spawned workers start with a clean interpreter, so the signing key never gets to enter their address space. And since the supervisor's fork inherits from the worker (which never had the key), the task would not have it either now. Not sure if I'm oversimplifying it though. You mentioned having some ideas on the cryptographically strong provenance side and I would really like to hear them. Anish On Fri, Feb 20, 2026 at 3:32 PM Vikram Koka via dev <[email protected]> wrote: > > +1 to Jarek's comments and questions here. > > I am concerned that these proposed changes at the PR level could create an > illusion of security, potentially leading to many "security bugs" reported > by users who may have a very different expectation. > > We need to clearly articulate a set of security expectations here before > addressing this in a set of PRs. > > Vikram > > On Fri, Feb 20, 2026 at 1:23 PM Jarek Potiuk <[email protected]> wrote: > > > I think there is one more thing that I've been mentioning some time ago and > > it's time to put in more concrete words. > > > > Currently there is **no** protection against the tasks making claims that > > they belong to other tasks. While the running task by default receives the > > generated token to use from the supervisor - there is absolutely no problem > > for the forked task to inspect parent process memory to get the > > "supervisor" token that is used to sign the "task" token and generate a new > > token with **any"" dag_id or task_id or basically any other claim. > > > > This is by design currently, because we do not have any control implemented > > and part of the security model of Airflow 3.0 - 3.1 is that any task can > > perform **any** action on task SDK and we never even attempt to verify > > which tasks, dags state it can modify, which connections or variables it > > accesses. We only need to know that this "task" was authorised by the > > scheduler to call "task-sdk" API. > > > > With multi-team, this assumption is broken. We **need** to know and > > **enforce** task_id provenance. The situation when one task pretends to be > > another task is not acceptable any more - and violates basic isolation > > between the teams. > > > > As I understand the way how current supervisor-> task JWT token generation > > works is (and please correct me if I am wrong): > > > > * when supervisor starts it reads configuration of ("jwt_secret" / > > "jwt_private_key_path" / "jwt_kid") > > * when it starts a task, it uses this "secret" to generate a JWT_token for > > that task (with "dag_id", "dag_run_id", "task_instance_id") claims - and it > > is used by supervisor to communicate with api_server > > * forked task does not have direct reference to that token nor to the > > jwt_secret when started - it does not get it passed > > * executing task process is only supposed to communicate with the > > supervisor via in-process communication, it does not open connection nor > > use the JWT_token directly > > > > Now ... the interesting thing is that while the forked process does not > > have an "easy" API to not only get the token and use it directly, but also > > to generate NEW token because no matter how hard we try, the forked task > > will **always** be able to access "jwt_secret" and create its own JWT_token > > - and add **ANY** claims to that token. That's simply a consequence of > > using our fork model, also additional thing is that (default) approach of > > using the same unix user in the forked process, enables the forked process > > to read **any** of the information that supervisor process accesses > > (including configuration files, env variables and even memory of the > > supervisor process). > > > > There are two ways how running task can get JWT_SECRET: > > > > * since the task process is forked from the supervisor - everything that > > parent process has in memory - even if the method executed in the fork has > > no direct reference to it. The forked process can use "globals" and get to > > any variable, function, class, method that the parent supervisor process > > has. It can read any data in memory of the process. So if the JWT_Secret is > > already in memory of the parent process when the task process is forked, it > > also is in memory of the task process > > > > * since the task process is the same unix user as the parent process - it > > has access to all the same configuration, environment data. Even if the > > parent process will clear os.environ - the child process can read original > > environment vairables the parent process has been started with using > > `/proc` filesystem (it just needs to know the parent process id - which it > > always has). Unless more sophisticated mechanism are used such SELinux > > (requires kernel with SELinux and configured system-level SELinux rules) , > > user impersonation and cgroups/proper access control to files (requires > > sudo access for parent process id) - such forked process can do > > **everything** the parent process can do - including reading the > > configuration of JWT_secret and creating JWT_tokens with (again) any > > task_instance_id, any dag id, and dag_run id claim. > > > > So no matter what we do on the "server" side - the client side (supervisor) > > - in the default configuration already allows the task to pretend they are > > "whatever dag id" - in which case server side verification is pointless. > > > > I believe (Ashb? Kaxil? Amogh?) that was a deliberate decision when the API > > was designed when the Task SDK / JWT token for Airflow 3.0 was implemented > > (because we did not need it). > > > > I would love to hear if my thinking is wrong, but I highly doubt it, so I > > wonder what were the original thoughts here on how the task identity can > > have "cryptographically strong" provenance ? I have some ideas for that, > > but I would love to hear what the original author's thoughts are ? > > > > J. > > > > On Fri, Feb 20, 2026 at 8:49 PM Anish Giri <[email protected]> > > wrote: > > > > > Thanks, Vincent! I appreciate your review. I'll get started on the > > > implementation and tag you on the PRs. > > > > > > Anish > > > > > > On Fri, Feb 20, 2026 at 8:23 AM Vincent Beck <[email protected]> > > wrote: > > > > > > > > Hey Anish, > > > > > > > > Everything you said makes sense to me, I might have questions on > > > specifics but I rather keep them for PRs, that'll make everything way > > > easier. > > > > > > > > Feel free to ping me on all your PRs, > > > > Vincent > > > > > > > > On 2026/02/20 07:34:47 Anish Giri wrote: > > > > > Hello everyone, > > > > > > > > > > Jarek asked for a proposal on #60125 [1] before implementing access > > > > > control for the Execution API's resource endpoints (variables, > > > > > connections, XComs), so here it is. > > > > > > > > > > After going through the codebase, I think this is really about > > > > > completing AIP-67's [2] multi-team boundary enforcement rather than > > > > > introducing a new security model. Most of the infrastructure already > > > > > exists. What's missing are the actual authorization checks. > > > > > > > > > > The current state: > > > > > > > > > > The Execution API has three authorization stubs that always return > > > True: > > > > > > > > > > - has_variable_access() in execution_api/routes/variables.py > > > > > - has_connection_access() in execution_api/routes/connections.py > > > > > - has_xcom_access() in execution_api/routes/xcoms.py > > > > > > > > > > All three have a "# TODO: Placeholder for actual implementation" > > > comment. > > > > > > > > > > For variables and connections, vincbeck's data-layer team scoping > > > > > (#58905 [4], #59476 [5]) already prevents cross-team data retrieval > > in > > > > > practice. A cross-team request returns a 404 rather than the > > resource. > > > > > So the data isolation is there, but the auth stubs don't reject these > > > > > requests early with a proper 403, and there's no second layer of > > > > > protection at the auth check itself. > > > > > > > > > > For XComs, the situation is different. There is no isolation at any > > > > > layer. XCom routes take dag_id, run_id, and task_id directly from URL > > > > > path parameters with no validation against the calling task's > > > > > identity. A task in Team-A's bundle can currently read and write > > > > > Team-B's XComs. > > > > > > > > > > There's already a get_team_name_dep() function in deps.py that > > > > > resolves a task's team via TaskInstance -> DagModel -> DagBundleModel > > > > > -> Team in a single join query. The variable and connection endpoints > > > > > already use it. XCom routes don't use it at all. > > > > > > > > > > Proposed approach: > > > > > > > > > > I'm thinking of this in two parts: > > > > > > > > > > 1) Team boundary checks for variables and connections > > > > > > > > > > Fill the auth stubs with team boundary checks. For reference, the > > Core > > > > > API does this in security.py. requires_access_variable() resolves the > > > > > resource's team via Variable.get_team_name(key), wraps it in > > > > > VariableDetails, and passes it to > > > > > auth_manager.is_authorized_variable(method, details, user). The auth > > > > > manager then checks team membership. > > > > > > > > > > For the Execution API, the flow would be similar but without going > > > > > through the auth manager (I'll explain why below): > > > > > > > > > > variable_key -> Variable.get_team_name(key) -> resource_team > > > > > token.id -> get_team_name_dep() -> task_team > > > > > Deny if resource_team != task_team (when both are non-None) > > > > > > > > > > When core.multi_team is disabled, get_team_name_dep returns None and > > > > > the check is skipped, so current single-team behavior stays exactly > > > > > the same. > > > > > > > > > > 2) XCom authorization > > > > > > > > > > This is the harder part. For writes, I think we should verify the > > > > > calling task is writing its own XComs -- the task identity from the > > > > > JWT should match the dag_id/task_id in the URL path. For reads, > > > > > enforce team boundary so a task can only read XComs from tasks within > > > > > the same team. This would allow cross-DAG xcom_pull within a team > > > > > (which people already do) while preventing cross-team access. > > > > > > > > > > To avoid a DB lookup on every request, I'd propose adding dag_id to > > > > > the JWT claims at generation time. The dag_id is already on the > > > > > TaskInstance schema in ExecuteTask.make() (workloads.py:142). The > > > > > JWTReissueMiddleware already preserves all claims during token > > > > > refresh, so this wouldn't break anything. Adding task_id and run_id > > to > > > > > the token could be done as a follow-up -- there's a TODO at > > > > > xcoms.py:315 about eventually deriving these from the token instead > > of > > > > > the URL. > > > > > > > > > > I'm not proposing to add team_name to the token. It's not available > > on > > > > > the TaskInstance schema at generation time. Resolving it requires a > > DB > > > > > join through DagModel -> DagBundleModel -> Team, which would slow > > down > > > > > the scheduler's task queuing path. Better to resolve it at request > > > > > time via get_team_name_dep. > > > > > > > > > > Why not go through BaseAuthManager? > > > > > > > > > > One design question I want to raise: the Execution API auth stubs > > > > > currently don't call BaseAuthManager.is_authorized_*(), and I think > > > > > they probably shouldn't. The BaseAuthManager interface is designed > > > > > around human identity (BaseUser with roles and team memberships), but > > > > > the Execution API operates on task identity (TIToken with a UUID). > > > > > These are very different things. A task doesn't have a "role" in the > > > > > RBAC sense, it has a team derived from its DAG's bundle. > > > > > > > > > > I'm leaning toward keeping the authorization logic directly in the > > > > > has_*_access dependency functions, using get_team_name_dep for team > > > > > resolution. This keeps the Execution API auth simple and avoids tying > > > > > task authorization to the human auth manager. But I'd like to hear if > > > > > others think we should instead extend BaseAuthManager with > > > > > task-identity-aware methods. > > > > > > > > > > What about single-team deployments? > > > > > > > > > > When core.multi_team=False (the default for most deployments), the > > > > > team boundary checks would be skipped entirely for variables and > > > > > connections. For XComs, I think write ownership verification (task > > can > > > > > only write its own XComs) is worth keeping regardless of multi-team > > > > > mode -- it's more of a correctness concern than an authorization one. > > > > > But I can also see the argument for a complete no-op when multi_team > > > > > is off to keep things simple. > > > > > > > > > > Out of scope: > > > > > > > > > > AIP-72 [3] mentions three possible authorization models: > > > > > pre-declaration (DAGs declare required resources), runtime request > > > > > with deployment-level policy, and OPA integration via WASM bindings. > > > > > I'm not trying to address any of those here. The team-boundary > > > > > enforcement is the base that all three future models need. > > > > > > > > > > Implementation plan: > > > > > > > > > > 1. Add dag_id claim to JWT token generation in workloads.py > > > > > 2. Implement has_variable_access team boundary check > > > > > 3. Implement has_connection_access team boundary check > > > > > 4. Implement has_xcom_access with write ownership + team boundary > > > > > 5. Add XCom team resolution (XCom routes currently have no > > > > > get_team_name_dep usage) > > > > > 6. Tests for all authorization scenarios including cross-team denial > > > > > 7. Documentation update for multi-team authorization behavior > > > > > > > > > > This should be a fairly small change -- mostly filling in the > > existing > > > > > stubs with actual checks. > > > > > > > > > > Let me know what you think. > > > > > > > > > > Anish > > > > > > > > > > [1] > > > https://github.com/apache/airflow/issues/60125#issuecomment-3712218766 > > > > > [2] > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components > > > > > [3] > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-72+Task+Execution+Interface+aka+Task+SDK > > > > > [4] https://github.com/apache/airflow/pull/58905 > > > > > [5] https://github.com/apache/airflow/pull/59476 > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: [email protected] > > > > > For additional commands, e-mail: [email protected] > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [email protected] > > > > For additional commands, e-mail: [email protected] > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [email protected] > > > For additional commands, e-mail: [email protected] > > > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
