Hi Jorge, I really appreciate your thinking on this and the direction of the proposed changes.
Task Groups were originally conceived as a UI only construct. This was intended to make it easier for users to view their DAGs, but was not intended to change how they run those DAGs. I definitely support the change you are proposing here at a conceptual level. I am struggling a little with reviewing the PR at the moment, but do intend to spend more time looking at it. The one key area of concern I have is with respect to the database changes needed and especially DB migrations required. This is purely a caution to consider during implementation and as part of rollout. I am looking forward to seeing this evolve. Vikram On Wed, Feb 18, 2026 at 7:17 PM Jorge Rocamora García < [email protected]> wrote: > Hi all, > > I’d like to clarify that several concrete use cases were already described > in the original issue: https://github.com/apache/airflow/issues/21867 > > One important aspect is that with the deprecation of SubDAGs in favor of > TaskGroups, some retry semantics were lost. > > In my specific case, I’m using the KubernetesPodOperator, where different > steps must run in separate pods because they depend on different software. > However, conceptually, the entire block needs to behave as a single > logical unit. For example: > > - A: Create a PersistentVolumeClaim (PVC) to share data > - B: Retrieve and prepare inputs > - C: Run the analysis > - D: Remove the PVC > > This pattern was previously achievable with SubDAGs, but there is currently > no straightforward mechanism that preserves this grouped execution and > retry behavior. > > Best regards, > Jorge > > On 2026/02/18 22:20:10 Daniel Standish via dev wrote: > > Yeah I think arguing that there’s a need for it with use cases is a good > > idea. > > > > > > On Wed, Feb 18, 2026 at 12:02 PM Natanel <[email protected]> wrote: > > > > > Hello, I have skimmed over the PR, overall I have to say that it looks > > > good. > > > I have yet to find a use case for this (as I just can't think of one) > where > > > I find the feature useful, and I will appreciate it if you could give > an > > > example use case for the feature, as it looks like quite a bit of > changes > > > have been introduced (including a new table and new dependency types) > for a > > > feature which allows for task groups to be retried. > > > > > > I would love to hear about what the use case of the feature is, as I > just > > > can't think of one, I think that it might be simpler to implement if we > do > > > something like a composite task instance, yet I do not want to propose > > > anything before I hear mroe about the use case, as I am most likely > just > > > missing something. > > > > > > Best regards, > > > Natanel. > > > > > > On Wed, 18 Feb 2026 at 17:49, Jorge Rocamora García < > > > [email protected]> wrote: > > > > > > > Hi all, > > > > > > > > I’d like to start a discussion around Task Group retries. > > > > > > > > Issue: https://github.com/apache/airflow/issues/21867 > > > > PR: https://github.com/apache/airflow/pull/61809 > > > > > > > > This PR introduces a proof of concept for TaskGroup retries, allowing > a > > > > whole TaskGroup to be retried as a unit rather than relying only on > > > > individual task retries. > > > > > > > > In addition to standard retry parameters (retries, retry_delay, > > > > exponential backoff, etc.), this proposal introduces > TaskGroup-specific > > > > retry semantics, including: > > > > > > > > > > > > * > > > > retry_condition: allows defining when a group should be retried > (e.g., > > > > based on aggregated task states), enabling more flexible policies > than > > > > simple failure-based retries. > > > > * > > > > retry_fast_fail: enables fail-fast behavior within the group, so that > > > once > > > > a retry-triggering condition is met, the group can short-circuit > > > remaining > > > > tasks and move directly to retry handling. > > > > > > > > The implementation adds retry configuration to TaskGroup, introduces > a > > > > task_group_instance model to persist retry state per DagRun, and > includes > > > > scheduler logic to evaluate retry conditions, enforce delay/backoff, > and > > > > clear group tasks for subsequent attempts. The feature is opt-in and > does > > > > not affect existing DAGs unless configured. > > > > > > > > I’d appreciate feedback on: > > > > > > > > > > > > * > > > > The proposed API. > > > > * > > > > The scheduler and state-management approach. > > > > * > > > > The new model/migration. > > > > * > > > > Whether the retry semantics feel intuitive and consistent with > existing > > > > task-level retries. > > > > * > > > > .. > > > > > > > > If there is general agreement on the direction, I’m happy to continue > > > > refining the implementation. > > > > > > > > Best, > > > > Jorge > > > > > > > > > > > > > >
