+1 to Jarek's comments and questions here. I am concerned that these proposed changes at the PR level could create an illusion of security, potentially leading to many "security bugs" reported by users who may have a very different expectation.
We need to clearly articulate a set of security expectations here before addressing this in a set of PRs. Vikram On Fri, Feb 20, 2026 at 1:23 PM Jarek Potiuk <[email protected]> wrote: > I think there is one more thing that I've been mentioning some time ago and > it's time to put in more concrete words. > > Currently there is **no** protection against the tasks making claims that > they belong to other tasks. While the running task by default receives the > generated token to use from the supervisor - there is absolutely no problem > for the forked task to inspect parent process memory to get the > "supervisor" token that is used to sign the "task" token and generate a new > token with **any"" dag_id or task_id or basically any other claim. > > This is by design currently, because we do not have any control implemented > and part of the security model of Airflow 3.0 - 3.1 is that any task can > perform **any** action on task SDK and we never even attempt to verify > which tasks, dags state it can modify, which connections or variables it > accesses. We only need to know that this "task" was authorised by the > scheduler to call "task-sdk" API. > > With multi-team, this assumption is broken. We **need** to know and > **enforce** task_id provenance. The situation when one task pretends to be > another task is not acceptable any more - and violates basic isolation > between the teams. > > As I understand the way how current supervisor-> task JWT token generation > works is (and please correct me if I am wrong): > > * when supervisor starts it reads configuration of ("jwt_secret" / > "jwt_private_key_path" / "jwt_kid") > * when it starts a task, it uses this "secret" to generate a JWT_token for > that task (with "dag_id", "dag_run_id", "task_instance_id") claims - and it > is used by supervisor to communicate with api_server > * forked task does not have direct reference to that token nor to the > jwt_secret when started - it does not get it passed > * executing task process is only supposed to communicate with the > supervisor via in-process communication, it does not open connection nor > use the JWT_token directly > > Now ... the interesting thing is that while the forked process does not > have an "easy" API to not only get the token and use it directly, but also > to generate NEW token because no matter how hard we try, the forked task > will **always** be able to access "jwt_secret" and create its own JWT_token > - and add **ANY** claims to that token. That's simply a consequence of > using our fork model, also additional thing is that (default) approach of > using the same unix user in the forked process, enables the forked process > to read **any** of the information that supervisor process accesses > (including configuration files, env variables and even memory of the > supervisor process). > > There are two ways how running task can get JWT_SECRET: > > * since the task process is forked from the supervisor - everything that > parent process has in memory - even if the method executed in the fork has > no direct reference to it. The forked process can use "globals" and get to > any variable, function, class, method that the parent supervisor process > has. It can read any data in memory of the process. So if the JWT_Secret is > already in memory of the parent process when the task process is forked, it > also is in memory of the task process > > * since the task process is the same unix user as the parent process - it > has access to all the same configuration, environment data. Even if the > parent process will clear os.environ - the child process can read original > environment vairables the parent process has been started with using > `/proc` filesystem (it just needs to know the parent process id - which it > always has). Unless more sophisticated mechanism are used such SELinux > (requires kernel with SELinux and configured system-level SELinux rules) , > user impersonation and cgroups/proper access control to files (requires > sudo access for parent process id) - such forked process can do > **everything** the parent process can do - including reading the > configuration of JWT_secret and creating JWT_tokens with (again) any > task_instance_id, any dag id, and dag_run id claim. > > So no matter what we do on the "server" side - the client side (supervisor) > - in the default configuration already allows the task to pretend they are > "whatever dag id" - in which case server side verification is pointless. > > I believe (Ashb? Kaxil? Amogh?) that was a deliberate decision when the API > was designed when the Task SDK / JWT token for Airflow 3.0 was implemented > (because we did not need it). > > I would love to hear if my thinking is wrong, but I highly doubt it, so I > wonder what were the original thoughts here on how the task identity can > have "cryptographically strong" provenance ? I have some ideas for that, > but I would love to hear what the original author's thoughts are ? > > J. > > On Fri, Feb 20, 2026 at 8:49 PM Anish Giri <[email protected]> > wrote: > > > Thanks, Vincent! I appreciate your review. I'll get started on the > > implementation and tag you on the PRs. > > > > Anish > > > > On Fri, Feb 20, 2026 at 8:23 AM Vincent Beck <[email protected]> > wrote: > > > > > > Hey Anish, > > > > > > Everything you said makes sense to me, I might have questions on > > specifics but I rather keep them for PRs, that'll make everything way > > easier. > > > > > > Feel free to ping me on all your PRs, > > > Vincent > > > > > > On 2026/02/20 07:34:47 Anish Giri wrote: > > > > Hello everyone, > > > > > > > > Jarek asked for a proposal on #60125 [1] before implementing access > > > > control for the Execution API's resource endpoints (variables, > > > > connections, XComs), so here it is. > > > > > > > > After going through the codebase, I think this is really about > > > > completing AIP-67's [2] multi-team boundary enforcement rather than > > > > introducing a new security model. Most of the infrastructure already > > > > exists. What's missing are the actual authorization checks. > > > > > > > > The current state: > > > > > > > > The Execution API has three authorization stubs that always return > > True: > > > > > > > > - has_variable_access() in execution_api/routes/variables.py > > > > - has_connection_access() in execution_api/routes/connections.py > > > > - has_xcom_access() in execution_api/routes/xcoms.py > > > > > > > > All three have a "# TODO: Placeholder for actual implementation" > > comment. > > > > > > > > For variables and connections, vincbeck's data-layer team scoping > > > > (#58905 [4], #59476 [5]) already prevents cross-team data retrieval > in > > > > practice. A cross-team request returns a 404 rather than the > resource. > > > > So the data isolation is there, but the auth stubs don't reject these > > > > requests early with a proper 403, and there's no second layer of > > > > protection at the auth check itself. > > > > > > > > For XComs, the situation is different. There is no isolation at any > > > > layer. XCom routes take dag_id, run_id, and task_id directly from URL > > > > path parameters with no validation against the calling task's > > > > identity. A task in Team-A's bundle can currently read and write > > > > Team-B's XComs. > > > > > > > > There's already a get_team_name_dep() function in deps.py that > > > > resolves a task's team via TaskInstance -> DagModel -> DagBundleModel > > > > -> Team in a single join query. The variable and connection endpoints > > > > already use it. XCom routes don't use it at all. > > > > > > > > Proposed approach: > > > > > > > > I'm thinking of this in two parts: > > > > > > > > 1) Team boundary checks for variables and connections > > > > > > > > Fill the auth stubs with team boundary checks. For reference, the > Core > > > > API does this in security.py. requires_access_variable() resolves the > > > > resource's team via Variable.get_team_name(key), wraps it in > > > > VariableDetails, and passes it to > > > > auth_manager.is_authorized_variable(method, details, user). The auth > > > > manager then checks team membership. > > > > > > > > For the Execution API, the flow would be similar but without going > > > > through the auth manager (I'll explain why below): > > > > > > > > variable_key -> Variable.get_team_name(key) -> resource_team > > > > token.id -> get_team_name_dep() -> task_team > > > > Deny if resource_team != task_team (when both are non-None) > > > > > > > > When core.multi_team is disabled, get_team_name_dep returns None and > > > > the check is skipped, so current single-team behavior stays exactly > > > > the same. > > > > > > > > 2) XCom authorization > > > > > > > > This is the harder part. For writes, I think we should verify the > > > > calling task is writing its own XComs -- the task identity from the > > > > JWT should match the dag_id/task_id in the URL path. For reads, > > > > enforce team boundary so a task can only read XComs from tasks within > > > > the same team. This would allow cross-DAG xcom_pull within a team > > > > (which people already do) while preventing cross-team access. > > > > > > > > To avoid a DB lookup on every request, I'd propose adding dag_id to > > > > the JWT claims at generation time. The dag_id is already on the > > > > TaskInstance schema in ExecuteTask.make() (workloads.py:142). The > > > > JWTReissueMiddleware already preserves all claims during token > > > > refresh, so this wouldn't break anything. Adding task_id and run_id > to > > > > the token could be done as a follow-up -- there's a TODO at > > > > xcoms.py:315 about eventually deriving these from the token instead > of > > > > the URL. > > > > > > > > I'm not proposing to add team_name to the token. It's not available > on > > > > the TaskInstance schema at generation time. Resolving it requires a > DB > > > > join through DagModel -> DagBundleModel -> Team, which would slow > down > > > > the scheduler's task queuing path. Better to resolve it at request > > > > time via get_team_name_dep. > > > > > > > > Why not go through BaseAuthManager? > > > > > > > > One design question I want to raise: the Execution API auth stubs > > > > currently don't call BaseAuthManager.is_authorized_*(), and I think > > > > they probably shouldn't. The BaseAuthManager interface is designed > > > > around human identity (BaseUser with roles and team memberships), but > > > > the Execution API operates on task identity (TIToken with a UUID). > > > > These are very different things. A task doesn't have a "role" in the > > > > RBAC sense, it has a team derived from its DAG's bundle. > > > > > > > > I'm leaning toward keeping the authorization logic directly in the > > > > has_*_access dependency functions, using get_team_name_dep for team > > > > resolution. This keeps the Execution API auth simple and avoids tying > > > > task authorization to the human auth manager. But I'd like to hear if > > > > others think we should instead extend BaseAuthManager with > > > > task-identity-aware methods. > > > > > > > > What about single-team deployments? > > > > > > > > When core.multi_team=False (the default for most deployments), the > > > > team boundary checks would be skipped entirely for variables and > > > > connections. For XComs, I think write ownership verification (task > can > > > > only write its own XComs) is worth keeping regardless of multi-team > > > > mode -- it's more of a correctness concern than an authorization one. > > > > But I can also see the argument for a complete no-op when multi_team > > > > is off to keep things simple. > > > > > > > > Out of scope: > > > > > > > > AIP-72 [3] mentions three possible authorization models: > > > > pre-declaration (DAGs declare required resources), runtime request > > > > with deployment-level policy, and OPA integration via WASM bindings. > > > > I'm not trying to address any of those here. The team-boundary > > > > enforcement is the base that all three future models need. > > > > > > > > Implementation plan: > > > > > > > > 1. Add dag_id claim to JWT token generation in workloads.py > > > > 2. Implement has_variable_access team boundary check > > > > 3. Implement has_connection_access team boundary check > > > > 4. Implement has_xcom_access with write ownership + team boundary > > > > 5. Add XCom team resolution (XCom routes currently have no > > > > get_team_name_dep usage) > > > > 6. Tests for all authorization scenarios including cross-team denial > > > > 7. Documentation update for multi-team authorization behavior > > > > > > > > This should be a fairly small change -- mostly filling in the > existing > > > > stubs with actual checks. > > > > > > > > Let me know what you think. > > > > > > > > Anish > > > > > > > > [1] > > https://github.com/apache/airflow/issues/60125#issuecomment-3712218766 > > > > [2] > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components > > > > [3] > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-72+Task+Execution+Interface+aka+Task+SDK > > > > [4] https://github.com/apache/airflow/pull/58905 > > > > [5] https://github.com/apache/airflow/pull/59476 > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [email protected] > > > > For additional commands, e-mail: [email protected] > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [email protected] > > > For additional commands, e-mail: [email protected] > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > >
