Dennis just let me know my earlier reply was not to the list but just to him.
I'll add it here for posterity. Pretend this came prior to dennis's last response, on dec 15 at 10:10am pacific: ----- I guess I'm a bit confused. What does it even mean to be put in the queue for a dag run? Lemme try to comment on some points > I did some digging and it looks like this goes back to a commit in 2023 > [3] where there was an Issue [4] requesting that start_date should not get > changed (which I agree with) and I suspect that queued_at was just > collateral damage here and this is safe to fix, but I want to see if anyone > knows of a reason this should remain how it is. Wait why shouldn't start date get changed? Again this is just, because of the unfortunate reality that we want to reuse dag runs and we are struggling to have the same record tell us the story about two different things (first run and later runs) Re queued_at, i'm not sure of any reason why it shouldn't be updated. I would say the same about start date too though. *Let us remember that there is a Log table we can use to record things that happen if we need to look back historically beyond the most recent run.* I feel like any time the run is cleared it should get tossed back on the > queue regardless of the current state, and doing so should update the > existing queued_at column. It's simple, clean, and predictable. Yeah I mean it sounds good to me. But I don't know what it means to queue a dag run -- can you clarify the behavioral implications (as opposed to the strictly recordkeeping implications of updating or not updating the timestamps)? RE "is it breaking", it really sounds like XKCD 1172 <https://xkcd.com/1172/> territory (the only xkcd i know, because it comes up so often), so, my thought is don't worry about it. On Wed, Dec 17, 2025 at 1:07 AM Ephraim Anierobi <[email protected]> wrote: > We also need to account for when task instances are cleared. If a > task-instance is cleared out of many other task-instances in a running > dagrun, should we change the queued_at for the dagrun? > > On Mon, 15 Dec 2025 at 18:17, Ferruzzi, Dennis <[email protected]> > wrote: > >> Would my "temporary" fix work for now, or is there some corner I'm not >> seeing around? >> >> >> - ferruzzi >> >> "Nothing is more permanent than a temporary solution" >> >> >> >> ________________________________ >> From: Daniel Standish via dev <[email protected]> >> Sent: Saturday, December 13, 2025 12:21 PM >> To: [email protected] >> Cc: Daniel Standish >> Subject: RE: [EXT] DagRun queued_at timestamp discussion >> >> CAUTION: This email originated from outside of the organization. Do not >> click links or open attachments unless you can confirm the sender and know >> the content is safe. >> >> >> >> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. >> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez >> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que >> le contenu ne présente aucun risque. >> >> >> >> The main cause here is that we reuse our dag runs, when we shouldn't. >> >> But resolving that is a big project. >> >> On Fri, Dec 12, 2025 at 4:40 PM Ferruzzi, Dennis <[email protected]> >> wrote: >> >> > I'll amend my proposal a little with a less-intrusive proposal. We can >> > leave the RUNNING -> RUNNING transition alone entirely, revert my >> TERMINAL >> > -> QUEUED transition fix, and make the following change in the >> > clear_task_instances method [1] instead: >> > >> > ``` >> > >> > for dr in drs: >> > if dr.state in State.finished_dr_states: >> > # ... existing code ... >> > >> > # Always update queued_at when clearing, regardless of the old state >> > if dag_run_state == DagRunState.QUEUED or dr.state == >> > DagRunState.RUNNING: >> > dr.queued_at = timezone.utcnow() >> > dr.clear_number += 1 >> > ``` >> > >> > >> > That would be super clean and the only effective change would be that >> > queued_at will always reflect the most recent time it was marked as >> ready >> > for scheduling, which I feel was the intent of that column (but I may be >> > wrong) >> > >> > >> > >> > [1] >> > >> https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/models/taskinstance.py#L192 >> > >> > >> > - ferruzzi >> > >> > >> > ________________________________ >> > From: Stefan Wang <[email protected]> >> > Sent: Wednesday, December 10, 2025 9:57 PM >> > To: [email protected] >> > Subject: RE: [EXT] DagRun queued_at timestamp discussion >> > >> > CAUTION: This email originated from outside of the organization. Do not >> > click links or open attachments unless you can confirm the sender and >> know >> > the content is safe. >> > >> > >> > >> > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur >> externe. >> > Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne >> pouvez >> > pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain >> que >> > le contenu ne présente aucun risque. >> > >> > >> > >> > Hi Dennis, >> > >> > Thanks for the great investigation work and for opening this discussion! >> > >> > PR #59066 was a clean fix, and I really appreciated how you dug into the >> > SQLAlchemy behavior to understand what was happening. Here are my >> thoughts >> > (though I'm relatively new to the community and still catching up on >> > discussions pre-2023, so please take my input as just one perspective) >> :p >> > The PR solves the immediate problem well. Updating queued_at when >> clearing >> > finished runs makes sense semantically and fixes the Deadline Alert >> issue. >> > The explicit assignment is clear and sidesteps the property-setter >> > persistence issue nicely. >> > >> > 1. RUNNING → RERUNNING vs. RUNNING → QUEUED >> > I can see why this feels unintuitive. My guess is RERUNNING exists to >> > preserve "this was already executing" information, which might be useful >> > for monitoring/debugging?, but I don't have context on the original >> design >> > decisions here. >> > This feels like a bigger state machine question—would be interesting to >> > see thoughts from the community. IIUC, Deadline Alerts should be okay if >> > this stays as-is for now though. >> > >> > 2. Should queued_at update for RERUNNING too? >> > PR #59066 updated queued_at for finished → QUEUED but not for RUNNING → >> > RERUNNING. For Deadline Alert consistency, it probably should update in >> > both cases. The semantic tension is that RERUNNING doesn't actually go >> > through a "queued" stage, but pragmatically it's "starting the execution >> > attempt again," so resetting the timer makes sense to me. >> > >> > 3. queued_at vs. last_queued_at Column >> > In a "perfect world" I'd prefer renaming the existing queued_at column >> to >> > something like last_queued_at, submitted_at, or last_attempt_at for >> clearer >> > semantics. However, I can see how that kind of breaking change would be >> a >> > migration headache without strong enough justification. Keeping >> queued_at >> > as-is seems pragmatic. >> > >> > 4. Queue Time Arrays >> > Agreed this could be a bit over-engineering at this stage. >> > >> > Thanks, >> > Stefan >> > >> > > On Dec 9, 2025, at 2:04 PM, Ferruzzi, Dennis <[email protected]> >> > wrote: >> > > >> > > There's a good bit of context here to get to where I'm going, but the >> > TLDR is "Should the DagRun's queued_at time get updated when a user >> clears >> > the Dag run?" >> > > >> > > Initial Question: When a user has a Deadline using the >> > DeadlineReference.DAGRUN_QUEUED_AT and the Dag run gets cleared, >> shouldn't >> > the timer restart? >> > > Initial Answer: Surely. If my deadline says "email me if this has >> been >> > queued for more than a half hour", then I would expect re-queuing the >> run >> > should reset that timer. >> > > >> > > I was surprised to learn that it doesn't, so I did some digging and >> > found a few related bits that feel odd to me. I'm looking to start up a >> > discussion about whether they are intentional and needed or if they can >> > safely be fixed. Given the code structure, they appear to be intentional >> > decisions, so I want to make sure I'm not missing context before >> changing >> > things. >> > > >> > > What's currently happening: >> > > >> > > When a user clicks on the "clear dag run" button in the UI, we land in >> > models/taskinstance.py::clear_task_instance [1]. If the run was in a >> > terminal state then set the state to QUEUED, if the run was RUNNING then >> > skip the queue and go straight to RERUNNING. Neither case changes the >> > queued_at timestamp. [2] >> > > >> > > I did some digging and it looks like this goes back to a commit in >> 2023 >> > [3] where there was an Issue [4] requesting that start_date should not >> get >> > changed (which I agree with) and I suspect that queued_at was just >> > collateral damage here and this is safe to fix, but I want to see if >> anyone >> > knows of a reason this should remain how it is. >> > > >> > > Discussion Points: >> > > >> > > 1. Why does a finished run get sent to the queue but a RUNNING dag run >> > skip the queue when it is cleared? That seems unintuitive and I would >> > expect it to get tossed to the queue for the scheduler to pick back up >> > regardless of the current state. >> > > 2. Bikeshed time! IF a change is needed (TBD), then should we update >> > queued_at or add a new column perhaps last_queued_at to prevent breaking >> > anyone's workflow? If we're adding a new column, should it be the >> > last_queued_at, or should it be an array showing every time it was >> > re-queued? If the latter, do we add an API so users can access that? >> > > >> > > My thoughts and proposal: >> > > >> > > I feel like any time the run is cleared it should get tossed back on >> the >> > queue regardless of the current state, and doing so should update the >> > existing queued_at column. It's simple, clean, and predictable. If we >> > decide that is a breaking change then my backup proposal is always >> re-queue >> > and add a single-timestamp last_queued_at non-nullable column which >> > defaults to match queued_at at init and gets updated each time the run >> is >> > re-queued. >> > > >> > > >> > > [1] >> > >> https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/models/taskinstance.py#L192 >> > > [2] This is now a lie, I changed the terminal state to update in >> > https://github.com/apache/airflow/pull/59066 but for now let's leave >> that >> > out of the discussion. It may need to be reverted or modified >> depending on >> > how this discussion goes. >> > > [3] https://github.com/apache/airflow/pull/30125 >> > > [4] https://github.com/apache/airflow/issues/30124 >> > > >> > > >> > > >> > > >> > > - ferruzzi >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: [email protected] >> > For additional commands, e-mail: [email protected] >> > >> > >> >
