karenbraganz opened a new issue, #47700:
URL: https://github.com/apache/airflow/issues/47700

   ### Apache Airflow version
   
   2.10.5
   
   ### If "Other Airflow 2 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   I encountered an issue where the code in a single DAG crashed the scheduler 
resulting in an outage for the entire Airflow instance. 
   
   This happened because a user mistakenly included a function object as a 
value in the `executor_config` dictionary. Certain versions of the dill package 
are not able to unpickle function objects. When the DAG containing this code 
started running, the below exception was raised in the scheduling loop and the 
scheduler pod started CrashLooping.
   ```
   [2025-03-12T21:23:30.643+0000] {scheduler_job_runner.py:1016} ERROR - 
Exception when executing SchedulerJob._run_scheduler_loop
   Traceback (most recent call last):
     File 
"/usr/local/lib/python3.12/site-packages/airflow/jobs/scheduler_job_runner.py", 
line 999, in _execute
       self._run_scheduler_loop()
     File 
"/usr/local/lib/python3.12/site-packages/airflow/jobs/scheduler_job_runner.py", 
line 1138, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.12/site-packages/airflow/jobs/scheduler_job_runner.py", 
line 1252, in _do_scheduling
       callback_tuples = self._schedule_all_dag_runs(guard, dag_runs, session)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.12/site-packages/airflow/utils/retries.py", 
line 93, in wrapped_function
       for attempt in run_with_db_retries(max_retries=retries, logger=logger, 
**retry_kwargs):
                      
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.12/site-packages/tenacity/__init__.py", line 
443, in __iter__
       do = self.iter(retry_state=retry_state)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.12/site-packages/tenacity/__init__.py", line 
376, in iter
       result = action(retry_state)
                ^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.12/site-packages/tenacity/__init__.py", line 
398, in <lambda>
       self._add_action_func(lambda rs: rs.outcome.result())
                                        ^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 449, in 
result
       return self.__get_result()
              ^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in 
__get_result
       raise self._exception
     File "/usr/local/lib/python3.12/site-packages/airflow/utils/retries.py", 
line 102, in wrapped_function
       return func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.12/site-packages/airflow/jobs/scheduler_job_runner.py", 
line 1622, in _schedule_all_dag_runs
       callback_tuples = [(run, self._schedule_dag_run(run, session=session)) 
for run in dag_runs]
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.12/site-packages/airflow/jobs/scheduler_job_runner.py", 
line 1713, in _schedule_dag_run
       schedulable_tis, callback_to_run = dag_run.update_state(session=session, 
execute_callbacks=False)
                                          
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.12/site-packages/airflow/utils/session.py", 
line 94, in wrapper
       return func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.12/site-packages/airflow/models/dagrun.py", 
line 801, in update_state
       info = self.task_instance_scheduling_decisions(session)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.12/site-packages/airflow/utils/session.py", 
line 94, in wrapper
       return func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.12/site-packages/airflow/models/dagrun.py", 
line 967, in task_instance_scheduling_decisions
       tis = self.get_task_instances(session=session, state=State.task_states)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.12/site-packages/airflow/utils/session.py", 
line 94, in wrapper
       return func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.12/site-packages/airflow/models/dagrun.py", 
line 629, in get_task_instances
       return DagRun.fetch_task_instances(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.12/site-packages/airflow/api_internal/internal_api_call.py",
 line 166, in wrapper
       return func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.12/site-packages/airflow/utils/session.py", 
line 94, in wrapper
       return func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.12/site-packages/airflow/models/dagrun.py", 
line 566, in fetch_task_instances
       return session.scalars(tis).all()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.12/site-packages/sqlalchemy/engine/result.py", line 
1476, in all
       return self._allrows()
              ^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.12/site-packages/sqlalchemy/engine/result.py", line 
401, in _allrows
       rows = self._fetchall_impl()
              ^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.12/site-packages/sqlalchemy/engine/result.py", line 
1389, in _fetchall_impl
       return self._real_result._fetchall_impl()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.12/site-packages/sqlalchemy/engine/result.py", line 
1813, in _fetchall_impl
       return list(self.iterator)
              ^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.12/site-packages/sqlalchemy/orm/loading.py", 
line 147, in chunks
       fetch = cursor._raw_all_rows()
               ^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.12/site-packages/sqlalchemy/engine/result.py", line 
393, in _raw_all_rows
       return [make_row(row) for row in rows]
               ^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.12/site-packages/airflow/utils/sqlalchemy.py", line 
256, in process
       value = super_process(value)  # unpickle
               ^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.12/site-packages/sqlalchemy/sql/sqltypes.py", 
line 1870, in process
       return loads(value)
              ^^^^^^^^^^^^
     File "/usr/local/lib/python3.12/site-packages/dill/_dill.py", line 275, in 
loads
       return load(file, ignore, **kwds)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.12/site-packages/dill/_dill.py", line 270, in 
load
       return Unpickler(file, ignore=ignore, **kwds).load()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.12/site-packages/dill/_dill.py", line 472, in 
load
       obj = StockUnpickler.load(self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
   TypeError: code() argument 13 must be str, not int
   
   ```
   
   The issue continued until the DAG causing this error was identified and 
removed.
   
   ### What you think should happen instead?
   
   We should have safeguards to ensure that DAG code cannot crash the scheduler 
and cause an outage for the entire Airflow instance. In this case, several 
other DAGs were not able to run due to the CrashLooping scheduler.
   
   ### How to reproduce
   
   1. Set dill==0.3.1.1
   2. Create a DAG with the below code. This is a simplified version of the 
code that originally caused the issue.
   ```
   import datetime
   
   from airflow.decorators import dag
   from airflow.decorators import task
   
   def config_value():
       def nested_value():
           print("xyz")
       return nested_value
   
   
   def return_executor_config():
       executor_config= {
       "key": config_value()
       }
       return executor_config
   
   
   @dag(start_date=datetime.datetime(2024, 10, 1), schedule="@daily", 
catchup=False)
   def buggy_dag():
   
       @task(executor_config=return_executor_config())
       def buggy_task():
           print("buggy task")
   
       buggy_task()
   
   buggy_dag()
   ```
   3. Run the DAG.
   4. Observe the scheduler, and review the logs when it starts CrashLooping.
   
   ### Operating System
   
   Debian GNU/Linux 12 (bookworm)
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Astronomer
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to