shivanshs9 opened a new issue #11899: URL: https://github.com/apache/airflow/issues/11899
<!-- Welcome to Apache Airflow! For a smooth issue process, try to answer the following questions. Don't worry if they're not all applicable; just try to include what you can :-) If you need to include code snippets or logs, please put them in fenced code blocks. If they're super-long, please use the details tag like <details><summary>super-long log</summary> lots of stuff </details> Please delete these comment blocks before submitting the issue. --> <!-- IMPORTANT!!! PLEASE CHECK "SIMILAR TO X EXISTING ISSUES" OPTION IF VISIBLE NEXT TO "SUBMIT NEW ISSUE" BUTTON!!! PLEASE CHECK IF THIS ISSUE HAS BEEN REPORTED PREVIOUSLY USING SEARCH!!! Please complete the next sections or the issue will be closed. These questions are the first thing we need to know to understand the context. --> **Apache Airflow version**: v2.0.0a2 **Environment**: - **Database**: MariaDB **What happened**: <!-- (please include exact error messages if you can) --> Scheduler main process crashed repeatedly (observed 7 crashes in just 4 minutes). The crashes were observed to happen only if `max_threads` option in the scheduler section is set greater than 1 (in this case, 2) with `use_row_level_locking = True`. Setting either `max_threads = 1` or `use_row_level_locking = False` fixed the issue, but are more of an hack. **What you expected to happen**: Scheduler process to run normally. <!-- What do you think went wrong? --> **How to reproduce it**: <!--- As minimally and precisely as possible. Keep in mind we do not have access to your cluster or dags. If you are using kubernetes, please attempt to recreate the issue using minikube or kind. ## Install minikube/kind - Minikube https://minikube.sigs.k8s.io/docs/start/ - Kind https://kind.sigs.k8s.io/docs/user/quick-start/ If this is a UI bug, please provide a screenshot of the bug or a link to a youtube video of the bug in action You can include images using the .md style of  To record a screencast, mac users can use QuickTime and then create an unlisted youtube video with the resulting .mov file. ---> **Anything else we need to know**: <!-- How often does this problem occur? Once? Every time etc? Any relevant logs to include? Put them here in side a detail tag: <details><summary>x.log</summary> lots of stuff </details> --> <details> <summary>Scheduler logs</summary> ``` [2020-10-26 09:03:54,608] {{settings.py:49}} INFO - Configured default timezone Timezone('UTC') [2020-10-26 09:04:05,467] {{scheduler_job.py:1327}} ERROR - Exception when executing SchedulerJob._run_scheduler_loop Traceback (most recent call last): File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context self.dialect.do_execute( File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute cursor.execute(statement, parameters) File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 255, in execute self.errorhandler(self, exc, value) File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/connections.py", line 50, in defaulterrorhandler raise errorvalue File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 252, in execute res = self._query(query) File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 379, in _query self._do_get_result(db) File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 182, in _do_get_result self._result = result = self._get_result() File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 411, in _get_result return self._get_db().store_result() _mysql_exceptions.OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction') The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1308, in _execute self._run_scheduler_loop() File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1379, in _run_scheduler_loop num_queued_tis = self._do_scheduling(session) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1451, in _do_scheduling self._create_dag_runs(query.all(), session) File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3341, in all return list(self) File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3503, in __iter__ return self._execute_and_instances(context) File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3528, in _execute_and_instances result = conn.execute(querycontext.statement, self._params) File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1014, in execute return meth(self, multiparams, params) File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1127, in _execute_clauseelement ret = self._execute_context( File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1317, in _execute_context self._handle_dbapi_exception( File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1511, in _handle_dbapi_exception util.raise_( File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 178, in raise_ raise exception File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context self.dialect.do_execute( File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute cursor.execute(statement, parameters) File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 255, in execute self.errorhandler(self, exc, value) File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/connections.py", line 50, in defaulterrorhandler raise errorvalue File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 252, in execute res = self._query(query) File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 379, in _query self._do_get_result(db) File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 182, in _do_get_result self._result = result = self._get_result() File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 411, in _get_result return self._get_db().store_result() sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') [SQL: SELECT dag.dag_id AS dag_dag_id, dag.root_dag_id AS dag_root_dag_id, dag.is_paused AS dag_is_paused, dag.is_subdag AS dag_is_subdag, dag.is_active AS dag_is_active, dag.last_scheduler_run AS dag_last_scheduler_run, dag.last_pickled AS dag_last_pickled, dag.last_expired AS dag_last_expired, dag.scheduler_lock AS dag_scheduler_lock, dag.pickle_id AS dag_pickle_id, dag.fileloc AS dag_fileloc, dag.owners AS dag_owners, dag.description AS dag_description, dag.default_view AS dag_default_view, dag.schedule_interval AS dag_schedule_interval, dag.concurrency AS dag_concurrency, dag.has_task_concurrency_limits AS dag_has_task_concurrency_limits, dag.next_dagrun AS dag_next_dagrun, dag.next_dagrun_create_after AS dag_next_dagrun_create_after FROM dag WHERE dag.is_paused IS false AND dag.is_active IS true AND dag.next_dagrun_create_after <= now() ORDER BY dag.next_dagrun_create_after LIMIT %s FOR UPDATE] [parameters: (10,)] (Background on this error at: http://sqlalche.me/e/13/e3q8) [2020-10-26 09:04:06,512] {{process_utils.py:102}} INFO - Sending Signals.SIGTERM to GPID 7437 [2020-10-26 09:04:07,029] {{process_utils.py:68}} INFO - Process psutil.Process(pid=7762, status='terminated', started='09:04:05') (7762) terminated with exit code None [2020-10-26 09:04:07,122] {{process_utils.py:219}} INFO - Waiting up to 5 seconds for processes to exit... [2020-10-26 09:04:07,126] {{process_utils.py:68}} INFO - Process psutil.Process(pid=7774, status='terminated', started='09:04:05') (7774) terminated with exit code None [2020-10-26 09:04:07,128] {{process_utils.py:68}} INFO - Process psutil.Process(pid=7437, status='terminated', exitcode=0, started='09:03:54') (7437) terminated with exit code 0 [2020-10-26 09:04:07,128] {{process_utils.py:68}} INFO - Process psutil.Process(pid=7775, status='terminated', started='09:04:05') (7775) terminated with exit code None [2020-10-26 09:04:07,129] {{process_utils.py:68}} INFO - Process psutil.Process(pid=7773, status='terminated', started='09:04:05') (7773) terminated with exit code None [2020-10-26 09:04:07,129] {{process_utils.py:68}} INFO - Process psutil.Process(pid=7782, status='terminated', started='09:04:05') (7782) terminated with exit code None ``` </detail> ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
