+1 on both. Changing and Airflow 3. Apart of some concerns about the name
itself, I never remember what kind of tasks are zombies and what triggers
that.

I think zombie is a bit overloaded term especially in container world where
you have zombie processes already (when your init process does not do
zombie process reaping properly) and that might be confusing.

Explicitly naming it - even if it is longer might be a bit more obvious.

śr., 12 lut 2025, 13:15 użytkownik kalyan reddy <kalyan.be...@live.com>
napisał:

> +1 to the idea and to restrict the change to Airflow 3 only
> ________________________________
> From: Wei Lee <weilee...@gmail.com>
> Sent: 12 February 2025 17:01
> To: dev@airflow.apache.org <dev@airflow.apache.org>
> Subject: Re: Updating "zombie task" terminology to "task heartbeat timeout"
>
> I like this idea as well. But not sure whether it would affect monitoring.
> 🤔 If we’re to introduce it, we’d better make it airflow 3 only and make
> sure we add a migration rule as we’re changing the configuration
>
> Best,
> Wei
>
> > On Feb 12, 2025, at 6:10 AM, Ryan Hatter <ryan.hat...@astronomer.io.invalid>
> wrote:
> >
> > I love it. "heartbeat timeout" is obvious and has meaning in software
> > beyond Airflow, so it makes sense to stick with this verbiage and use it
> to
> > replace "zombie" in docs, configs, logs, and code IMO.
> >
> > On Tue, Feb 11, 2025 at 4:15 PM Karen Braganza <
> karenbraganz...@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> I have been working on this PR
> >> <https://github.com/apache/airflow/pull/46257> to update our
> documentation
> >> on zombie tasks to reflect the terminology used in the user-facing event
> >> logs in Airflow 2.10+. The event logs use the terminology "heartbeat
> >> timeout" whereas the documentation uses the terminology "zombie tasks".
> I
> >> would like to update the documentation to focus on the "heartbeat
> timeout"
> >> terminology so that users are able to find and understand this
> >> documentation easily when they see a "heartbeat timeout" in the event
> logs.
> >>
> >> In the same vein, I think other user-facing configurations should also
> be
> >> updated to use the same terminology. I am proposing that we make the
> >> following changes to Airflow configuration variables:
> >>
> >> scheduler_zombie_task_threshold  -->  scheduler_task_heartbeat_
> >> timeout_threshold
> >> zombie_detection_interval --> task_heartbeat_timeout_detection_interval
> >>
> >> In addition to this, I propose that we also change the logs emitted by
> the
> >> scheduler to use the "task heartbeat timeout" terminology.
> >>
> >> For example, the below logs
> >> <
> >>
> https://github.com/apache/airflow/blob/dea2cc9afc61caf49621c3b1923bcf90e96e17e9/airflow/jobs/scheduler_job_runner.py#L2040
> >>>
> >> :
> >> self.log.error(
> >>                "Detected zombie job: %s "
> >>                "(See https://airflow.apache.org/docs/apache-airflow/";
> >>                "stable/core-concepts/tasks.html#zombie-tasks)",
> >>                request,
> >>            )
> >>
> >> should become:
> >>
> >> self.log.error(
> >>                "Detected task heartbeat timeout: %s "
> >>                "(See https://airflow.apache.org/docs/apache-airflow/";
> >>                "stable/core-concepts/tasks.html#zombie-tasks)",
> >>                request,
> >>            )
> >>
> >> I wanted to start this discussion to get everyone's thoughts on my
> >> proposal. Do you agree (or disagree) that at least all user-facing
> elements
> >> of Airflow should use the "task heartbeat timeout" terminology instead
> of
> >> "zombie tasks" for uniformity?
> >>
> >> I can add all of these changes to my PR.
> >>
> >> Best,
> >> Karen Braganza
> >>
> >>
> >> <
> >>
> https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#zombie-detection-interval
> >>>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>

Reply via email to