SchernHe commented on issue #24466: URL: https://github.com/apache/airflow/issues/24466#issuecomment-1303197912
Are there any news on this issue. I am also seeing similar behavior for a long running task. My DAG includes an SSHOperator that executes a long-running process on a Google Cloud Compute Engine. The tasks keeps on running, even though the underlying process gets killed for some reason along the way. The weird thing is, that I do not see any log messages that states why the process was killed. It just stops logging. ## General Information - Version: apache-airflow==2.4.2 - Executor: CeleryKubernetesExecutor + Redis - Docker Setup as described [here](https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html) - Compute Engine keeps on running and no reboot occured - Compute Engine shows no spikes or anything regarding CPU / GPU usage ## Task Logs Here are the logs of the latest execution: ``` *** Reading local file: /opt/airflow/logs/dag_id=XXXX/run_id=manual__2022-11-03T22:37:42.838999+00:00/task_id=XXXX/attempt=1.log [2022-11-04, 01:31:11 UTC] {taskinstance.py:1165} INFO - Dependencies all met for <TaskInstance: complete_training_pipeline.XXXX manual__2022-11-03T22:37:42.838999+00:00 [queued]> [2022-11-04, 01:31:11 UTC] {taskinstance.py:1165} INFO - Dependencies all met for <TaskInstance: complete_training_pipeline.XXXX manual__2022-11-03T22:37:42.838999+00:00 [queued]> [2022-11-04, 01:31:11 UTC] {taskinstance.py:1362} INFO - -------------------------------------------------------------------------------- [2022-11-04, 01:31:11 UTC] {taskinstance.py:1363} INFO - Starting attempt 1 of 1 [2022-11-04, 01:31:11 UTC] {taskinstance.py:1364} INFO - -------------------------------------------------------------------------------- [2022-11-04, 01:31:11 UTC] {taskinstance.py:1383} INFO - Executing <Task(SSHOperator): XXXX> on 2022-11-03 22:37:42.838999+00:00 [2022-11-04, 01:31:11 UTC] {standard_task_runner.py:55} INFO - Started process 18209 to run task [2022-11-04, 01:31:11 UTC] {standard_task_runner.py:82} INFO - Running: ['***', 'tasks', 'run', 'XXXX', 'XXXX', 'manual__2022-11-03T22:37:42.838999+00:00', '--job-id', '214', '--raw', '--subdir', 'DAGS_FOLDER/XXXX.py', '--cfg-path', '/tmp/tmp1y9_3kwh'] [2022-11-04, 01:31:11 UTC] {standard_task_runner.py:83} INFO - Job 214: Subtask XXXX [2022-11-04, 01:31:11 UTC] {task_command.py:376} INFO - Running <TaskInstance: XXXX.XXXX manual__2022-11-03T22:37:42.838999+00:00 [running]> on host 6eac1f0ec121 [2022-11-04, 01:31:12 UTC] {taskinstance.py:1590} INFO - Exporting the following env vars: AIRFLOW_CTX_DAG_OWNER=*** AIRFLOW_CTX_DAG_ID=XXXX AIRFLOW_CTX_TASK_ID=XXXX AIRFLOW_CTX_EXECUTION_DATE=2022-11-03T22:37:42.838999+00:00 AIRFLOW_CTX_TRY_NUMBER=1 AIRFLOW_CTX_DAG_RUN_ID=manual__2022-11-03T22:37:42.838999+00:00 [2022-11-04, 01:31:12 UTC] {ssh.py:137} INFO - Creating ssh_client ... [2022-11-04, 01:31:19 UTC] {transport.py:1874} INFO - Connected (version 2.0, client OpenSSH_7.9p1) [2022-11-04, 01:31:20 UTC] {transport.py:1874} INFO - Authentication (publickey) successful! [2022-11-04, 01:31:20 UTC] {ssh.py:469} INFO - Running command: ..... .... [2022-11-04, 02:11:20 UTC] {ssh.py:501} INFO - "My last log message" [2022-11-04, 07:32:16 UTC] {taskinstance.py:1155} INFO - Dependencies not met for <TaskInstance: complete_training_pipeline.start_bbox_training manual__2022-11-03T22:37:42.838999+00:00 [running]>, dependency 'Task Instance Not Running' FAILED: Task is in the running state [2022-11-04, 07:32:16 UTC] {taskinstance.py:1155} INFO - Dependencies not met for <TaskInstance: complete_training_pipeline.start_bbox_training manual__2022-11-03T22:37:42.838999+00:00 [running]>, dependency 'Task Instance State' FAILED: Task is in the 'running' state. [2022-11-04, 07:32:16 UTC] {local_task_job.py:103} INFO - Task is not able to be run ``` ## Scheduler Logs The CPU Utilization of the instance shows that the task stopped somewhere around *02:40:00 - 02:45:00UTC*, but the scheduler logs do not show anything around that time. Only the periodic `[2022-11-04 02:45:37,716] {scheduler_job.py:1381} INFO - Resetting orphaned tasks for active dag runs` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
