hussein-awala commented on PR #32684:
URL: https://github.com/apache/airflow/pull/32684#issuecomment-1705569449
> what is the way to recover from that when your task instances have been
deleted? Would back-filling work You think? Or do you need manually delete the
DagRun ?
For the old DagRuns I did nothing because they are not important, but for
the new ones blocked because of the `depends_on_past`, I had to delete all
these dag runs and the empty ones, and wait the scheduler to re-create them
(catchup is True in my case):
```sql
DELETE from dag_run dr where dr.dag_id in (
SELECT distinct (dr.dag_id) as dag_id
FROM dag_run dr left join task_instance ti on dr.dag_id = ti.dag_id and
dr.run_id = ti.run_id and ti.state is not NULL
WHERE ti.task_id is NULL and dr.execution_date >= '2023-08-30 08:00:00'
) and dr.execution_date >= '2023-08-30 08:00:00';
DELETE from task_instance ti where ti.dag_id in (
SELECT distinct (dr.dag_id) as dag_id
FROM dag_run dr left join task_instance ti on dr.dag_id = ti.dag_id and
dr.run_id = ti.run_id and ti.state is not NULL
WHERE ti.task_id is NULL and dr.execution_date >= '2023-08-30 08:00:00'
) and ti.run_id in (
SELECT distinct (dr.run_id) as run_id
FROM dag_run dr left join task_instance ti on dr.dag_id = ti.dag_id and
dr.run_id = ti.run_id and ti.state is not NULL
WHERE ti.task_id is NULL and dr.execution_date >= '2023-08-30 08:00:00'
);
```
Then to fix the issue completely without upgrading to 2.7.0, I cherry-picked
the fix (https://github.com/leboncoin/airflow/tree/lbc/2.6.3-r1), built Airflow
and pushed it to our private PyPi then I used the patched version instead of
the official one.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]