rcheatham-q opened a new issue #19415:
URL: https://github.com/apache/airflow/issues/19415
### Apache Airflow version
2.2.1 (latest released)
### Operating System
debian
### Versions of Apache Airflow Providers
apache-airflow==2.2.1
apache-airflow-providers-amazon==2.3.0
apache-airflow-providers-ftp==2.0.1
apache-airflow-providers-google==6.0.0
apache-airflow-providers-http==2.0.1
apache-airflow-providers-imap==2.0.1
apache-airflow-providers-jira==2.0.1
apache-airflow-providers-mysql==2.1.1
apache-airflow-providers-postgres==2.3.0
apache-airflow-providers-redis==2.0.1
apache-airflow-providers-sqlite==2.0.1
apache-airflow-providers-ssh==2.2.0
### Deployment
Other Docker-based deployment
### Deployment details
Dask executor, custom-built Docker images, postgres 12.7 backend
### What happened
1. Shut off Airflow cluster
2. Upgraded database from 2.0.2 to 2.2.1
3. Restarted Airflow cluster
4. The scheduler failed to start
5. I checked the logs, and found this:
<details>
<pre>
2021-11-04 21:15:35,566 ERROR - Exception when executing
SchedulerJob._run_scheduler_loop
Traceback (most recent call last):
File
"/usr/local/lib/python3.9/site-packages/airflow/jobs/scheduler_job.py", line
628, in _execute
self._run_scheduler_loop()
File
"/usr/local/lib/python3.9/site-packages/airflow/jobs/scheduler_job.py", line
709, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File
"/usr/local/lib/python3.9/site-packages/airflow/jobs/scheduler_job.py", line
782, in _do_scheduling
self._create_dagruns_for_dags(guard, session)
File "/usr/local/lib/python3.9/site-packages/airflow/utils/retries.py",
line 76, in wrapped_function
for attempt in run_with_db_retries(max_retries=retries, logger=logger,
**retry_kwargs):
File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line
382, in __iter__
do = self.iter(retry_state=retry_state)
File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line
349, in iter
return fut.result()
File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 438, in
result
return self.__get_result()
File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 390, in
__get_result
raise self._exception
File "/usr/local/lib/python3.9/site-packages/airflow/utils/retries.py",
line 85, in wrapped_function
return func(*args, **kwargs)
File
"/usr/local/lib/python3.9/site-packages/airflow/jobs/scheduler_job.py", line
847, in _create_dagruns_for_dags
self._create_dag_runs(query.all(), session)
File
"/usr/local/lib/python3.9/site-packages/airflow/jobs/scheduler_job.py", line
917, in _create_dag_runs
self._update_dag_next_dagruns(dag, dag_model,
active_runs_of_dags[dag.dag_id])
File
"/usr/local/lib/python3.9/site-packages/airflow/jobs/scheduler_job.py", line
926, in _update_dag_next_dagruns
if total_active_runs >= dag_model.max_active_runs:
TypeError: '>=' not supported between instances of 'int' and 'NoneType'
</pre>
</details>
6. I checked the code and saw that dag_model.max_active_runs comes directly
from the database and is nullable
7. I updated all values for dag.max_active_runs to be non-null
8. Restarted the scheduler
9. Everything ran fine
### What you expected to happen
In this case, I would expect the code to properly handle nullable columns,
probably by using the default value provided in the configs.
### How to reproduce
This is easily reproducible. Start a fresh instance of Airflow with a
database at version 2.2.1 and follow these steps:
1. Add a DAG to the instance
2. Manually set dag.max_active_runs to null in the database
3. Enable the DAG to cause the scheduler to attempt to parse/schedule it
4. BOOM! The scheduler will crash
### Anything else
Newly registered DAGs have this value populated in the database with the
default value, so this issue will likely only occur on a database upgrade.
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]