karenbraganz opened a new issue, #47700:
URL: https://github.com/apache/airflow/issues/47700
### Apache Airflow version
2.10.5
### If "Other Airflow 2 version" selected, which one?
_No response_
### What happened?
I encountered an issue where the code in a single DAG crashed the scheduler
resulting in an outage for the entire Airflow instance.
This happened because a user mistakenly included a function object as a
value in the `executor_config` dictionary. Certain versions of the dill package
are not able to unpickle function objects. When the DAG containing this code
started running, the below exception was raised in the scheduling loop and the
scheduler pod started CrashLooping.
```
[2025-03-12T21:23:30.643+0000] {scheduler_job_runner.py:1016} ERROR -
Exception when executing SchedulerJob._run_scheduler_loop
Traceback (most recent call last):
File
"/usr/local/lib/python3.12/site-packages/airflow/jobs/scheduler_job_runner.py",
line 999, in _execute
self._run_scheduler_loop()
File
"/usr/local/lib/python3.12/site-packages/airflow/jobs/scheduler_job_runner.py",
line 1138, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.12/site-packages/airflow/jobs/scheduler_job_runner.py",
line 1252, in _do_scheduling
callback_tuples = self._schedule_all_dag_runs(guard, dag_runs, session)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/airflow/utils/retries.py",
line 93, in wrapped_function
for attempt in run_with_db_retries(max_retries=retries, logger=logger,
**retry_kwargs):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/tenacity/__init__.py", line
443, in __iter__
do = self.iter(retry_state=retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/tenacity/__init__.py", line
376, in iter
result = action(retry_state)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/tenacity/__init__.py", line
398, in <lambda>
self._add_action_func(lambda rs: rs.outcome.result())
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 449, in
result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in
__get_result
raise self._exception
File "/usr/local/lib/python3.12/site-packages/airflow/utils/retries.py",
line 102, in wrapped_function
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.12/site-packages/airflow/jobs/scheduler_job_runner.py",
line 1622, in _schedule_all_dag_runs
callback_tuples = [(run, self._schedule_dag_run(run, session=session))
for run in dag_runs]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.12/site-packages/airflow/jobs/scheduler_job_runner.py",
line 1713, in _schedule_dag_run
schedulable_tis, callback_to_run = dag_run.update_state(session=session,
execute_callbacks=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/airflow/utils/session.py",
line 94, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/airflow/models/dagrun.py",
line 801, in update_state
info = self.task_instance_scheduling_decisions(session)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/airflow/utils/session.py",
line 94, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/airflow/models/dagrun.py",
line 967, in task_instance_scheduling_decisions
tis = self.get_task_instances(session=session, state=State.task_states)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/airflow/utils/session.py",
line 94, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/airflow/models/dagrun.py",
line 629, in get_task_instances
return DagRun.fetch_task_instances(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.12/site-packages/airflow/api_internal/internal_api_call.py",
line 166, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/airflow/utils/session.py",
line 94, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/airflow/models/dagrun.py",
line 566, in fetch_task_instances
return session.scalars(tis).all()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.12/site-packages/sqlalchemy/engine/result.py", line
1476, in all
return self._allrows()
^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.12/site-packages/sqlalchemy/engine/result.py", line
401, in _allrows
rows = self._fetchall_impl()
^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.12/site-packages/sqlalchemy/engine/result.py", line
1389, in _fetchall_impl
return self._real_result._fetchall_impl()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.12/site-packages/sqlalchemy/engine/result.py", line
1813, in _fetchall_impl
return list(self.iterator)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sqlalchemy/orm/loading.py",
line 147, in chunks
fetch = cursor._raw_all_rows()
^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.12/site-packages/sqlalchemy/engine/result.py", line
393, in _raw_all_rows
return [make_row(row) for row in rows]
^^^^^^^^^^^^^
File
"/usr/local/lib/python3.12/site-packages/airflow/utils/sqlalchemy.py", line
256, in process
value = super_process(value) # unpickle
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sqlalchemy/sql/sqltypes.py",
line 1870, in process
return loads(value)
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/dill/_dill.py", line 275, in
loads
return load(file, ignore, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/dill/_dill.py", line 270, in
load
return Unpickler(file, ignore=ignore, **kwds).load()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/dill/_dill.py", line 472, in
load
obj = StockUnpickler.load(self)
^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: code() argument 13 must be str, not int
```
The issue continued until the DAG causing this error was identified and
removed.
### What you think should happen instead?
We should have safeguards to ensure that DAG code cannot crash the scheduler
and cause an outage for the entire Airflow instance. In this case, several
other DAGs were not able to run due to the CrashLooping scheduler.
### How to reproduce
1. Set dill==0.3.1.1
2. Create a DAG with the below code. This is a simplified version of the
code that originally caused the issue.
```
import datetime
from airflow.decorators import dag
from airflow.decorators import task
def config_value():
def nested_value():
print("xyz")
return nested_value
def return_executor_config():
executor_config= {
"key": config_value()
}
return executor_config
@dag(start_date=datetime.datetime(2024, 10, 1), schedule="@daily",
catchup=False)
def buggy_dag():
@task(executor_config=return_executor_config())
def buggy_task():
print("buggy task")
buggy_task()
buggy_dag()
```
3. Run the DAG.
4. Observe the scheduler, and review the logs when it starts CrashLooping.
### Operating System
Debian GNU/Linux 12 (bookworm)
### Versions of Apache Airflow Providers
_No response_
### Deployment
Astronomer
### Deployment details
_No response_
### Anything else?
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]