ashb opened a new pull request, #28309:
URL: https://github.com/apache/airflow/pull/28309
There have been multiple reports of people with tasks stuck in the
running state, and no obvious activity from the running task, but the
supervisor is still actively heart beating.
In order to make it easier/possibly to tell _where_ the process is stuck
we add a SIGUSR2 handler to the Task supervisor (that is purposefully
inherited to the actual task process itself) that will print the current
stack trace on receiving USR2 - is the same signal we use for
causing a debug dump in the Scheduler.
Example output
```
<F28>[2022-12-12 17:35:44,713] {task_command.py:388} INFO - Running
<TaskInstance: example_bash_operator.run_after_loop
__airflow_temporary_run_2022-12-12T17:35:03.278763+00:00__ [running]> on host
sinope.
MainThread
File "/home/ash/.virtualenvs/airflow/bin/airflow", line 33, in <module>
sys.exit(load_entry_point('apache-airflow', 'console_scripts',
'airflow')())
File "/home/ash/code/airflow/airflow/airflow/__main__.py", line 39, in main
args.func(args)
File "/home/ash/code/airflow/airflow/airflow/cli/cli_parser.py", line 52,
in command
return func(*args, **kwargs)
File "/home/ash/code/airflow/airflow/airflow/utils/cli.py", line 108, in
wrapper
return f(*args, **kwargs)
File
"/home/ash/code/airflow/airflow/airflow/cli/commands/task_command.py", line
392, in task_run
_run_task_by_selected_method(args, dag, ti)
File
"/home/ash/code/airflow/airflow/airflow/cli/commands/task_command.py", line
193, in _run_task_by_selected_method
_run_task_by_local_task_job(args, ti)
File
"/home/ash/code/airflow/airflow/airflow/cli/commands/task_command.py", line
252, in _run_task_by_local_task_job
run_job.run()
File "/home/ash/code/airflow/airflow/airflow/jobs/base_job.py", line 258,
in run
self._execute()
File "/home/ash/code/airflow/airflow/airflow/jobs/local_task_job.py", line
181, in _execute
return_code = self.task_runner.return_code(timeout=max_wait_time)
File
"/home/ash/code/airflow/airflow/airflow/task/task_runner/standard_task_runner.py",
line 141, in return_code
self._rc = self.process.wait(timeout=timeout)
File
"/home/ash/.virtualenvs/airflow/lib/python3.10/site-packages/psutil/__init__.py",
line 1265, in wait
self._exitcode = self._proc.wait(timeout)
File
"/home/ash/.virtualenvs/airflow/lib/python3.10/site-packages/psutil/_pslinux.py",
line 1642, in wrapper
return fun(self, *args, **kwargs)
File
"/home/ash/.virtualenvs/airflow/lib/python3.10/site-packages/psutil/_pslinux.py",
line 1848, in wait
return _psposix.wait_pid(self.pid, timeout, self._name)
File
"/home/ash/.virtualenvs/airflow/lib/python3.10/site-packages/psutil/_psposix.py",
line 132, in wait_pid
interval = sleep(interval)
File
"/home/ash/.virtualenvs/airflow/lib/python3.10/site-packages/psutil/_psposix.py",
line 110, in sleep
_sleep(interval)
File "/home/ash/code/airflow/airflow/airflow/jobs/local_task_job.py", line
132, in sigusr2_debug_handler
traceback.print_stack(f=stack)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]