ashb opened a new pull request, #28309:
URL: https://github.com/apache/airflow/pull/28309

   There have been multiple reports of people with tasks stuck in the
   running state, and no obvious activity from the running task, but the
   supervisor is still actively heart beating.
   
   In order to make it easier/possibly to tell _where_ the process is stuck
   we add a SIGUSR2 handler to the Task supervisor (that is purposefully
   inherited to the actual task process itself) that will print the current
   stack trace on receiving USR2 - is the same signal we use for
   causing a debug dump in the Scheduler.
   
   Example output
   
   ```
   <F28>[2022-12-12 17:35:44,713] {task_command.py:388} INFO - Running 
<TaskInstance: example_bash_operator.run_after_loop 
__airflow_temporary_run_2022-12-12T17:35:03.278763+00:00__ [running]> on host 
sinope.
   MainThread
     File "/home/ash/.virtualenvs/airflow/bin/airflow", line 33, in <module>
       sys.exit(load_entry_point('apache-airflow', 'console_scripts', 
'airflow')())
     File "/home/ash/code/airflow/airflow/airflow/__main__.py", line 39, in main
       args.func(args)
     File "/home/ash/code/airflow/airflow/airflow/cli/cli_parser.py", line 52, 
in command
       return func(*args, **kwargs)
     File "/home/ash/code/airflow/airflow/airflow/utils/cli.py", line 108, in 
wrapper
       return f(*args, **kwargs)
     File 
"/home/ash/code/airflow/airflow/airflow/cli/commands/task_command.py", line 
392, in task_run
       _run_task_by_selected_method(args, dag, ti)
     File 
"/home/ash/code/airflow/airflow/airflow/cli/commands/task_command.py", line 
193, in _run_task_by_selected_method
       _run_task_by_local_task_job(args, ti)
     File 
"/home/ash/code/airflow/airflow/airflow/cli/commands/task_command.py", line 
252, in _run_task_by_local_task_job
       run_job.run()
     File "/home/ash/code/airflow/airflow/airflow/jobs/base_job.py", line 258, 
in run
       self._execute()
     File "/home/ash/code/airflow/airflow/airflow/jobs/local_task_job.py", line 
181, in _execute
       return_code = self.task_runner.return_code(timeout=max_wait_time)
     File 
"/home/ash/code/airflow/airflow/airflow/task/task_runner/standard_task_runner.py",
 line 141, in return_code
       self._rc = self.process.wait(timeout=timeout)
     File 
"/home/ash/.virtualenvs/airflow/lib/python3.10/site-packages/psutil/__init__.py",
 line 1265, in wait
       self._exitcode = self._proc.wait(timeout)
     File 
"/home/ash/.virtualenvs/airflow/lib/python3.10/site-packages/psutil/_pslinux.py",
 line 1642, in wrapper
       return fun(self, *args, **kwargs)
     File 
"/home/ash/.virtualenvs/airflow/lib/python3.10/site-packages/psutil/_pslinux.py",
 line 1848, in wait
       return _psposix.wait_pid(self.pid, timeout, self._name)
     File 
"/home/ash/.virtualenvs/airflow/lib/python3.10/site-packages/psutil/_psposix.py",
 line 132, in wait_pid
       interval = sleep(interval)
     File 
"/home/ash/.virtualenvs/airflow/lib/python3.10/site-packages/psutil/_psposix.py",
 line 110, in sleep
       _sleep(interval)
     File "/home/ash/code/airflow/airflow/airflow/jobs/local_task_job.py", line 
132, in sigusr2_debug_handler
       traceback.print_stack(f=stack)
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to