ephraimbuddy commented on a change in pull request #20349:
URL: https://github.com/apache/airflow/pull/20349#discussion_r774992478
##########
File path: airflow/jobs/scheduler_job.py
##########
@@ -402,6 +402,14 @@ def _executable_task_instances_to_queued(self, max_tis:
int, session: Session =
# Many dags don't have a task_concurrency, so where we can
avoid loading the full
# serialized DAG the better.
serialized_dag = self.dagbag.get_dag(dag_id,
session=session)
+ # If the dag is missing, continue to the next task.
+ if not serialized_dag:
+ self.log.error(
+ "DAG '%s' for taskinstance %s not found in
serialized_dag table",
+ dag_id,
+ task_instance,
+ )
+ continue
Review comment:
Sorry for the delay, wanted to properly reproduce it before responding.
Here's a reproduction step:
Run this dag:
```
from airflow import DAG
from datetime import datetime
dag = DAG(
"airflow_bug",
schedule_interval="0 1 * * *",
start_date=datetime(2021, 1, 1),
max_active_runs=1,
concurrency=1,
)
for i in range(100):
@dag.task(task_id=f'mytasrk{i}')
def sleeping():
import time
time.sleep(60)
sleeping()
```
Once you unpause and the task start running, remove the file from the Dag
folder.
Watch the scheduler logs, after some time it'll start crashing and won't
recover(until the whole tasks would start failing, I think).
```
[2021-12-24 05:55:22,877] {scheduler_job.py:623} ERROR - Executor reports
task instance <TaskInstance: airflow_bug.mytasrk97
scheduled__2021-01-01T01:00:00+00:00 [queued]> finished (failed) although the
task says its queued. (Info: None) Was the task killed externally?
[2021-12-24 05:55:22,881] {scheduler_job.py:630} ERROR - Marking task
instance <TaskInstance: airflow_bug.mytasrk97
scheduled__2021-01-01T01:00:00+00:00 [queued]> as failed
Traceback (most recent call last):
File "/opt/airflow/airflow/jobs/scheduler_job.py", line 628, in
_process_executor_events
task = dag.get_task(ti.task_id)
AttributeError: 'NoneType' object has no attribute 'get_task'
```
cc: @kaxil
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]