Ya that makes sense. TriggerDagRunOperator lives in the standard provider, which still supports 2.11 and 3.0-3.3, but the accessor and mixin are 3.3+ only. So the operator can only own its execution on 3.3+; everything older still goes through the path we have today. Shipping this into standard now wouldn't reduce the duplication, it'd add to it: 3.3+ on the accessor, 3.0-3.2 still on the exception path, 2.11 still on the Airflow 2 path. The POC (https://github.com/apache/airflow/pull/69135) bears that out — the 3.3+ end-state is clean and the live crash/reconnect holds up, but it also put the back-compat cost up for discussion.
The two things that hurts long-term maintainability are the contract drifting in two places (Amogh's point) and a version-gated fork we can't delete. So my read on the direction: - The runner keeps driving the wait (as it does now), but through ResumableJobMixin's extracted core instead of a hand-rolled copy (roughly what https://github.com/apache/airflow/pull/68952 and https://github.com/apache/airflow/pull/68955 do). So that we keep one contract instead of two and no version fork - The accessor (operator owns execution) is the right end-state. The catch is that it only stops being extra duplication once the standard provider can drop 2.11 / 3.0-3.2 and require 3.3+; below that, the old fallbacks have to stay. The POC shows what that end-state looks like, and the move from the shared core to it later is small. The accessor end-state needs a 3.3+ min. Whether and when we'd require that is the min_version-for-providers question being raised here, and +1 to reopening it. With the 12-month rule the floor would be around April 2027, so the shared-core interim would only need to cover ~9 months. Maybe that’s short enough that going straight to the accessor instead would also be reasonable? Happy to try out whichever way gets consensus. Stefan > On Jun 29, 2026, at 12:26 AM, Jarek Potiuk <[email protected]> wrote: > > I think my main concern is that TriggerDagRunOperator is part of the > standard provider, which currently means: > > * Airflow 2.11.0 support > * Airflow 3.0 -> 3.3 support > > The ti.trigger_dag_run() and the mixin are 3.3+, and the implementation > also needs to account for the deferrable path. > > Even 3.0 -> 3.2 is challenging, and I guess 2.11 will be nightmar'ish in > this scenario - we might end up with way more duplication than what current > https://github.com/apache/airflow/pull/68936 introduces. > > And of course unifying and "owning" execution is a better direction. > Switching to persisting run_id makes so much more sense so If we can find a > nice way without bumping standard to 3.3+ that would be great. > > BTW. Should we resume discussion about bumping min_version for Providers? > If we don't return to the regular schedule - we will start being again > (quoting Daniel Standish) - long term backwards compatibility engineers. > And if we have no clear vision - like we had in Airflow 2 regarding the > "min_version" approach—this will get worse month-by-month. > > Currently we have no idea how long we will need to support a back-compat > solution like this. This makes it difficult to make rational choices about > the level of duplication, deprecation, or backward compatibility worth > maintaining because we don't know the maintenance duration. Therefore, > working out reasonable trade-offs here is nearly impossible. > > If we started to apply the same rule we had in Airflow 2 (12 months since > .0 version release) we would have: > > * Today we would already have >= 3.1 > * 25th of September we would have >= 3.2 > * 7th of April 2027 we would have >= 3.3 > > This would mean that we would have 9 months of support for 3.2 until we > could get rid of any back-compat. > > J. > > > On Mon, Jun 29, 2026 at 8:32 AM Stefan Wang <[email protected]> wrote: > >> Thanks Amogh and Jarek: >> >> +1, this makes sense and is a better approach to take. Letting the >> operator own its execution and just subclass ResumableJobMixin is cleaner >> than what https://github.com/apache/airflow/pull/68936 does today. >> >> Right now the contract is duplicated in the task runner, and the proposed >> gets rid of the special case instead of trying to share it. >> >> The accessor is small. trigger_dag_run() would mirror the >> ti.get_dagrun_state() we already have, hitting the same execution API >> endpoint the runner uses today with the same token, so no new authz. It >> basically finishes the AIP-72 migration that added DagRunTriggerException >> as a stopgap (https://github.com/apache/airflow/pull/47882). >> >> Happy to do the POC, and rework #68936 onto the accessor then link it >> here. >> Would the main thing be checking is back-compat? - execute() currently >> raises on every Airflow 3 run, not just the ones that wait, so in the the >> POC we want to prove keeping that behavior identical? >> >> Best, >> Stefan >> >>> On Jun 28, 2026, at 10:38 PM, Jarek Potiuk <[email protected]> wrote: >>> >>> Sounds reasonable - maybe a quick POC would be good to show how it could >>> look like and allowed to assess if there are some back-compat concerns. >>> >>> On Mon, Jun 29, 2026 at 7:27 AM Amogh Desai <[email protected]> >> wrote: >>> >>>> Now that Airflow 3.3 will introduce ResumableJobMixin to make >> synchronous >>>> submit and poll operators crash-safe, I wanted to start a discussion. >>>> >>>> I came across https://github.com/apache/airflow/pull/68936, which >> brings >>>> crash recovery/durable exeucution to TriggerDagRunOperator, but it's a >> case >>>> which cannot use the mixin. On Airflow 3 the operator's execute() raises >>>> *DagRunTriggerException*; the actual trigger and the wait loop run in >> the >>>> task runner >>>> (_handle_trigger_dag_run). So the PR reimplements the mixin's three >> state >>>> contract (succeeded / reconnect / resubmit) and persist-before-poll by >> hand >>>> in the >>>> runner. This means that we will now have two copies of the same contract >>>> that can drift. >>>> >>>> It cannot use the mixin's contract because it only offloads its >> execution >>>> to task runner, and doesn't own it. For more context, the poll primitive >>>> already exists as a >>>> user-callable accessor (ti.get_dagrun_state()). The only missing >> primitive >>>> is triggering a dag run. >>>> >>>> I propose that we revisit this portion. I propose we introduce an >> execution >>>> API accessor in task sdk for triggering dagruns, which will be the >>>> counterpart to the existing >>>> ti.get_dagrun_state(). It routes through the same execution endpoint the >>>> runner already uses, so no new authz surface is changed. >>>> >>>> This proposal does not expand what task code can do, it just gives a >> first >>>> class way to do something already possible. A task JWT can already >> trigger >>>> dag runs through the >>>> Execution API today: that is exactly what DagRunTriggerException does >> under >>>> the hood. The proposed *ti.trigger_dag_run()* accessor routes through >> the >>>> same endpoint with >>>> the same scoped token, so the boundary is identical, just reached >> through a >>>> clean, supported API instead of an exception side channel. >>>> >>>> Happy to hear thoughts from folks. >>>> >>>> >>>> Thanks & Regards, >>>> Amogh Desai >>>> >> >>
