another plus one to no more zombie, and using clearer, more accurate
language instead.

On Wed, Feb 12, 2025 at 4:56 AM Abhishek Bhakat
<abhishek.bha...@astronomer.io.invalid> wrote:

> +1 to no more "zombie"
>
> Avi
>
> On Wed, Feb 12, 2025 at 12:38 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
> > +1 on both. Changing and Airflow 3. Apart of some concerns about the name
> > itself, I never remember what kind of tasks are zombies and what triggers
> > that.
> >
> > I think zombie is a bit overloaded term especially in container world
> where
> > you have zombie processes already (when your init process does not do
> > zombie process reaping properly) and that might be confusing.
> >
> > Explicitly naming it - even if it is longer might be a bit more obvious.
> >
> > śr., 12 lut 2025, 13:15 użytkownik kalyan reddy <kalyan.be...@live.com>
> > napisał:
> >
> > > +1 to the idea and to restrict the change to Airflow 3 only
> > > ________________________________
> > > From: Wei Lee <weilee...@gmail.com>
> > > Sent: 12 February 2025 17:01
> > > To: dev@airflow.apache.org <dev@airflow.apache.org>
> > > Subject: Re: Updating "zombie task" terminology to "task heartbeat
> > timeout"
> > >
> > > I like this idea as well. But not sure whether it would affect
> > monitoring.
> > > 🤔 If we’re to introduce it, we’d better make it airflow 3 only and
> make
> > > sure we add a migration rule as we’re changing the configuration
> > >
> > > Best,
> > > Wei
> > >
> > > > On Feb 12, 2025, at 6:10 AM, Ryan Hatter <ryan.hat...@astronomer.io
> > .invalid>
> > > wrote:
> > > >
> > > > I love it. "heartbeat timeout" is obvious and has meaning in software
> > > > beyond Airflow, so it makes sense to stick with this verbiage and use
> > it
> > > to
> > > > replace "zombie" in docs, configs, logs, and code IMO.
> > > >
> > > > On Tue, Feb 11, 2025 at 4:15 PM Karen Braganza <
> > > karenbraganz...@gmail.com>
> > > > wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> I have been working on this PR
> > > >> <https://github.com/apache/airflow/pull/46257> to update our
> > > documentation
> > > >> on zombie tasks to reflect the terminology used in the user-facing
> > event
> > > >> logs in Airflow 2.10+. The event logs use the terminology "heartbeat
> > > >> timeout" whereas the documentation uses the terminology "zombie
> > tasks".
> > > I
> > > >> would like to update the documentation to focus on the "heartbeat
> > > timeout"
> > > >> terminology so that users are able to find and understand this
> > > >> documentation easily when they see a "heartbeat timeout" in the
> event
> > > logs.
> > > >>
> > > >> In the same vein, I think other user-facing configurations should
> also
> > > be
> > > >> updated to use the same terminology. I am proposing that we make the
> > > >> following changes to Airflow configuration variables:
> > > >>
> > > >> scheduler_zombie_task_threshold  -->  scheduler_task_heartbeat_
> > > >> timeout_threshold
> > > >> zombie_detection_interval -->
> > task_heartbeat_timeout_detection_interval
> > > >>
> > > >> In addition to this, I propose that we also change the logs emitted
> by
> > > the
> > > >> scheduler to use the "task heartbeat timeout" terminology.
> > > >>
> > > >> For example, the below logs
> > > >> <
> > > >>
> > >
> >
> https://github.com/apache/airflow/blob/dea2cc9afc61caf49621c3b1923bcf90e96e17e9/airflow/jobs/scheduler_job_runner.py#L2040
> > > >>>
> > > >> :
> > > >> self.log.error(
> > > >>                "Detected zombie job: %s "
> > > >>                "(See
> https://airflow.apache.org/docs/apache-airflow/";
> > > >>                "stable/core-concepts/tasks.html#zombie-tasks)",
> > > >>                request,
> > > >>            )
> > > >>
> > > >> should become:
> > > >>
> > > >> self.log.error(
> > > >>                "Detected task heartbeat timeout: %s "
> > > >>                "(See
> https://airflow.apache.org/docs/apache-airflow/";
> > > >>                "stable/core-concepts/tasks.html#zombie-tasks)",
> > > >>                request,
> > > >>            )
> > > >>
> > > >> I wanted to start this discussion to get everyone's thoughts on my
> > > >> proposal. Do you agree (or disagree) that at least all user-facing
> > > elements
> > > >> of Airflow should use the "task heartbeat timeout" terminology
> instead
> > > of
> > > >> "zombie tasks" for uniformity?
> > > >>
> > > >> I can add all of these changes to my PR.
> > > >>
> > > >> Best,
> > > >> Karen Braganza
> > > >>
> > > >>
> > > >> <
> > > >>
> > >
> >
> https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#zombie-detection-interval
> > > >>>
> > > >>
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > For additional commands, e-mail: dev-h...@airflow.apache.org
> > >
> > >
> >
>

Reply via email to