Hello, I have skimmed over the PR, overall I have to say that it looks good.
I have yet to find a use case for this (as I just can't think of one) where
I find the feature useful, and I will appreciate it if you could give an
example use case for the feature, as it looks like quite a bit of changes
have been introduced (including a new table and new dependency types) for a
feature which allows for task groups to be retried.

I would love to hear about what the use case of the feature is, as I just
can't think of one, I think that it might be simpler to implement if we do
something like a composite task instance, yet I do not want to propose
anything before I hear mroe about the use case, as I am most likely just
missing something.

Best regards,
Natanel.

On Wed, 18 Feb 2026 at 17:49, Jorge Rocamora García <
[email protected]> wrote:

> Hi all,
>
> I’d like to start a discussion around Task Group retries.
>
> Issue: https://github.com/apache/airflow/issues/21867
> PR: https://github.com/apache/airflow/pull/61809
>
> This PR introduces a proof of concept for TaskGroup retries, allowing a
> whole TaskGroup to be retried as a unit rather than relying only on
> individual task retries.
>
> In addition to standard retry parameters (retries, retry_delay,
> exponential backoff, etc.), this proposal introduces TaskGroup-specific
> retry semantics, including:
>
>
>   *
> retry_condition: allows defining when a group should be retried (e.g.,
> based on aggregated task states), enabling more flexible policies than
> simple failure-based retries.
>   *
> retry_fast_fail: enables fail-fast behavior within the group, so that once
> a retry-triggering condition is met, the group can short-circuit remaining
> tasks and move directly to retry handling.
>
> The implementation adds retry configuration to TaskGroup, introduces a
> task_group_instance model to persist retry state per DagRun, and includes
> scheduler logic to evaluate retry conditions, enforce delay/backoff, and
> clear group tasks for subsequent attempts. The feature is opt-in and does
> not affect existing DAGs unless configured.
>
> I’d appreciate feedback on:
>
>
>   *
> The proposed API.
>   *
> The scheduler and state-management approach.
>   *
> The new model/migration.
>   *
> Whether the retry semantics feel intuitive and consistent with existing
> task-level retries.
>   *
> ..
>
> If there is general agreement on the direction, I’m happy to continue
> refining the implementation.
>
> Best,
> Jorge
>
>

Reply via email to