A 2025-04-03 19:28, Brent Bovenzi escreveu:
The issue is that duration is based off of start and end dates. If there is no end date we usually default to now. But that is misleading when a dag
run is running but the dag is paused.
Let me take a look at where we use duration in the 3.0 UI and see if we can reduce that confusion. We don't have the "5 longest dag runs" in our new dashboard page, which replaces cluster activity. If we wanted that feature again, we should be mindful of this and filter out paused dags in the API
request.



On Thu, Apr 3, 2025, 1:27 PM Pedro Nunes Leal
<pedro.n.l...@tecnico.ulisboa.pt.invalid> wrote:

A 2025-03-31 22:26, Jens Scheffler escreveu:
> Hi,
>
> thanks for working on the bug and raising a PR to fix it.
>
> As other commiters also commented I think from product view I'd expect
> a
> different resolution. We use the "Pause DAG" in most cases for
> administrative or infrastructure problems to prevent further failures
> and/or to drain infra to switch some backend.
>
> I assume when we pause a long-running DAG that is in-between execution
> of tasks we want to really "pause" scheduling, we don't want to set it
> to failed. That would also not be correct because once we un-pause the
> running DAGs should continoue to work. I see no reason marking this
> failed anf then manually running behind to reset the state later.
>
> My view on this is that as also proposed in the discussion of the bug,
> we should rather filter the paused DAG from clouster activity reporting
> such that paused DAGs are not reported with excessive runtime. Also
> later if un-paused it would be "right" that the overall DAG runtime was
> longer than normal (would not expect to deduct the paused time from
> runtime of the DAG.)
>
> If I want (as operator/admin) to really terminate existing running
> instances I'd rather walk through Browse -> DAG Runs --> Filter for
> running with paused DAG id and mark them as failed explicitly.
>
> Jens
>
> On 31.03.25 20:50, Pedro Nunes Leal wrote:
>> Hello everyone,
>>
>> Currently, I'm trying to fix this bug:
>> https://github.com/apache/airflow/issues/44443
>>
>> Basically, the issue is that the DAGs would be stuck on running even
>> though they were paused.
>> Consequently, the duration of the dag run will keep on increasing even
>> though the DAG is paused.
>>
>> My proposal to solve this problem is changing the DAGs state from
>> running to failed, when paused, to avoid the increment of their
>> duration.
>>
>> Since this can be an impactful change, I would like to hear what
>> others think about it.
>>
>> Link for the Pull Request:
>> https://github.com/apache/airflow/pull/47557
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
That can be a better approach.

However, if I'm not mistaken, the code related to the cluster activity
page doesn't exist in Airflow 3 (the version where I'm trying to do the
changes).

So what should I do in this case?
Is there any other way not involving cluster activity to solve this
problem?

The change to queued state instead of fail was my proposal at the
beginning, and it really pauses the DAG.
This is the type of solution I was thinking, because as I said before in
the pull request, I feel that the cluster activity behavior is just a
symptom from a bigger problem (the DAGs doesn't really pause, they just
keep running).

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org


Hello,

Any update related to the use of duration in the UI 3.0?

Maybe this bug isn't really an issue if cluster activity was removed in the newer version, and it's just something to have in mind in case something similar to cluster activity is implemented in 3.0 UI.

From what I understand, the current behavior of staying on running and the duration increasing is what is expected from the pause functionality.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to