nclaeys opened a new issue #20099:
URL: https://github.com/apache/airflow/issues/20099
### Apache Airflow version
2.2.2 (latest released)
### What happened
After deleting the dag, the scheduler starts crashlooping and cannot
recover. This means that an issue with the dag causes the whole environment to
be down.
The stacktrace is as follows:
`
airflow-scheduler [2021-12-07 09:30:07,483] {kubernetes_executor.py:791}
INFO - Shutting down Kubernetes executor
│
│ airflow-scheduler [2021-12-07 09:30:08,509] {process_utils.py:100} INFO -
Sending Signals.SIGTERM to GPID 1472
│
│ airflow-scheduler [2021-12-07 09:30:08,681] {process_utils.py:66} INFO -
Process psutil.Process(pid=1472, status='terminated', exitcode=0,
started='09:28:37') (1472) terminated with exit │
│ airflow-scheduler [2021-12-07 09:30:08,681] {scheduler_job.py:655} INFO -
Exited execute loop
│ airflow-scheduler Traceback (most recent call last):
│ airflow-scheduler File "/home/airflow/.local/bin/airflow", line 8, in
<module>
│ airflow-scheduler sys.exit(main())
│ airflow-scheduler File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/__main__.py", line
48, in main
│ airflow-scheduler args.func(args)
│ airflow-scheduler File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/cli/cli_parser.py",
line 48, in command
│ airflow-scheduler return func(*args, **kwargs)
│ airflow-scheduler File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/utils/cli.py", line
92, in wrapper
│ airflow-scheduler return f(*args, **kwargs)
│ airflow-scheduler File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/cli/commands/scheduler_command.py",
line 75, in scheduler │
│ airflow-scheduler _run_scheduler_job(args=args)
│ airflow-scheduler File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/cli/commands/scheduler_command.py",
line 46, in _run_scheduler_job │
│ airflow-scheduler job.run()
│ airflow-scheduler File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/jobs/base_job.py",
line 245, in run
│ airflow-scheduler self._execute()
│ airflow-scheduler File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/jobs/scheduler_job.py",
line 628, in _execute
│ airflow-scheduler self._run_scheduler_loop()
│ airflow-scheduler File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/jobs/scheduler_job.py",
line 709, in _run_scheduler_loop
│
│ airflow-scheduler num_queued_tis = self._do_scheduling(session)
│ airflow-scheduler File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/jobs/scheduler_job.py",
line 820, in _do_scheduling
│
│ airflow-scheduler num_queued_tis =
self._critical_section_execute_task_instances(session=session)
│ airflow-scheduler File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/jobs/scheduler_job.py",
line 483, in _critical_section_execute_task_instances
│
│ airflow-scheduler queued_tis =
self._executable_task_instances_to_queued(max_tis, session=session)
│ airflow-scheduler File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/utils/session.py",
line 67, in wrapper
│ airflow-scheduler return func(*args, **kwargs)
│ airflow-scheduler File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/jobs/scheduler_job.py",
line 366, in _executable_task_instances_to_queued
│
│ airflow-scheduler if serialized_dag.has_task(task_instance.task_id):
│ airflow-scheduler AttributeError: 'NoneType' object has no attribute
'has_task' `
### What you expected to happen
I expect that the scheduler does not crash because the dag gets deleted. The
biggest issue however is that the whole environment goes down, it would be
acceptable that the scheduler has issues with that dag (it is deleted after
all) but it should not affect all other dags on the environment.
### How to reproduce
1. I created the following dag:
`
from airflow import DAG
from datafy.operators import DatafyContainerOperatorV2
from datetime import datetime, timedelta
default_args = {
"owner": "Datafy",
"depends_on_past": False,
"start_date": datetime(year=2021, month=12, day=1),
"task_concurrency": 4,
"retries": 2,
"retry_delay": timedelta(minutes=5),
}
dag = DAG(
"testnielsdev", default_args=default_args,
max_active_runs=default_args["task_concurrency"] + 1, schedule_interval="0 1 *
* *",
)
DatafyContainerOperatorV2(
dag=dag,
task_id="sample",
cmds=["python"],
arguments=["-m", "testnielsdev.sample", "--date", "{{ ds }}", "--env",
"{{ macros.datafy.env() }}"],
instance_type="mx_small",
instance_life_cycle="spot",
)
`
When looking at the airflow code, the most important setting apart from the
defaults is to specify task_concurrency.
2. I enable the dag
3. I delete it. When the file gets removed, the scheduler starts
crashlooping.
### Operating System
We use the default airflow docker image
### Versions of Apache Airflow Providers
Not relevant
### Deployment
Other Docker-based deployment
### Deployment details
Not relevant
### Anything else
It occurred at one of our customers and I was quickly able to preproduce the
issue.
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]