easontm opened a new issue #22068:
URL: https://github.com/apache/airflow/issues/22068


   ### Apache Airflow version
   
   2.2.3
   
   ### What happened
   
   As far as I can tell, this is the series of events:
   1. Worker c8 receives the message
   2. Worker c8 starts processing
   3. Worker hl receives a separate message with the same payload but 
incremented try#
   4. Worker c8 experiences hostname mismatch
   
   
   Scheduler logs
   ```
   2022-03-07T21:19:40.926Z     scheduler2      Setting external_id for 
<TaskInstance: MY_DAG.MY_TASK scheduled__2022-03-07T12:01:00+00:00 [queued]> to 
8e41d584-1775-4b3f-8621-7fb379249b35
   2022-03-07T21:19:40.926Z     scheduler2      Adding to queue: ['airflow', 
'tasks', 'run', 'MY_DAG', 'MY_TASK', 'scheduled__2022-03-07T12:01:00+00:00', 
'--local', '--subdir', 'DAGS_FOLDER/MY_DAG.py']
   2022-03-07T21:19:40.926Z     scheduler2      Sending 
TaskInstanceKey(dag_id='MY_DAG', task_id='MY_TASK', 
run_id='scheduled__2022-03-07T12:01:00+00:00', try_number=2) to executor with 
priority 4 and queue default
   2022-03-07T21:19:39.042Z     scheduler1      Setting external_id for 
<TaskInstance: MY_DAG.MY_TASK scheduled__2022-03-07T12:01:00+00:00 [scheduled]> 
to b742838e-81b0-4bd6-87d7-eb910f3c787c
   2022-03-07T21:19:37.262Z     scheduler1      Adding to queue: ['airflow', 
'tasks', 'run', 'MY_DAG', 'MY_TASK', 'scheduled__2022-03-07T12:01:00+00:00', 
'--local', '--subdir', 'DAGS_FOLDER/MY_DAG.py']
   2022-03-07T21:19:37.262Z     scheduler1      Sending 
TaskInstanceKey(dag_id='MY_DAG', task_id='MY_TASK', 
run_id='scheduled__2022-03-07T12:01:00+00:00', try_number=1) to executor with 
priority 4 and queue default
   ```
   
   Messages, truncated/formatted for reading (from flower)
   ```
   FAILED     b742838e-81b0-4bd6-87d7-eb910f3c787c     'MY_DAG', 'MY_TASK', 
'scheduled__2022-03-07T12:01:00+00:00'     2022-03-07 21:19:36.975    Worker-c8
   SUCCEEDED  8e41d584-1775-4b3f-8621-7fb379249b35     'MY_DAG', 'MY_TASK', 
'scheduled__2022-03-07T12:01:00+00:00'     2022-03-07 21:19:40.150    Worker-hl
   ```
   
   Worker c8
   ```
   [2022-03-07 21:19:36,983: INFO/ForkPoolWorker-2] Celery task ID: 
b742838e-81b0-4bd6-87d7-eb910f3c787c
   [2022-03-07 21:19:37,401: WARNING/ForkPoolWorker-2] Running <TaskInstance: 
MY_DAG.MY_TASK scheduled__2022-03-07T12:01:00+00:00 [queued]> on host 
airflow-celery-worker-6bb54c97cb-c8wk7
   [2022-03-07 21:19:36,983: INFO/ForkPoolWorker-2] Executing command in 
Celery: ['airflow', 'tasks', 'run', 'MY_DAG', 'MY_TASK', 
'scheduled__2022-03-07T12:01:00+00:00', '--local', '--subdir', 
'DAGS_FOLDER/MY_DAG.py']
   [2022-03-07 21:19:44,642: ERROR/ForkPoolWorker-2] Failed to execute task 
Hostname of job runner does not match.
   ```
   
   Worker hl
   ```
   [2022-03-07 21:19:40,158: INFO/ForkPoolWorker-16] Celery task ID: 
8e41d584-1775-4b3f-8621-7fb379249b35
   [2022-03-07 21:19:40,558: WARNING/ForkPoolWorker-16] Running <TaskInstance: 
MY_DAG.MY_TASK scheduled__2022-03-07T12:01:00+00:00 [queued]> on host 
airflow-celery-worker-6bb54c97cb-hll5d
   [2022-03-07 21:19:40,158: INFO/ForkPoolWorker-16] Executing command in 
Celery: ['airflow', 'tasks', 'run', 'MY_DAG', 'MY_TASK', 
'scheduled__2022-03-07T12:01:00+00:00', '--local', '--subdir', 
'DAGS_FOLDER/MY_DAG.py']
   Then, there are two task instance logs:
   [2022-03-07, 21:19:38 UTC] {standard_task_runner.py:76} INFO - Running: 
['airflow', 'tasks', 'run', 'MY_DAG', 'MY_TASK', 
'scheduled__2022-03-07T12:01:00+00:00', '--job-id', '6537153', '--raw', 
'--subdir', 'DAGS_FOLDER/MY_DAG.py', '--cfg-path', '/tmp/tmpzubhcaom', 
'--error-file', '/tmp/tmpuvg0i7va']
   [2022-03-07, 21:19:38 UTC] {standard_task_runner.py:77} INFO - Job 6537153: 
Subtask MY_TASK
   [2022-03-07, 21:19:43 UTC] {local_task_job.py:194} WARNING - The recorded 
hostname airflow-celery-worker-6bb54c97cb-hll5d does not match this instance's 
hostname airflow-celery-worker-6bb54c97cb-c8wk7
   [2022-03-07, 21:19:43 UTC] {process_utils.py:124} INFO - Sending 
Signals.SIGTERM to group 4938. PIDs of all processes in the group: [4938]
   [2022-03-07, 21:19:43 UTC] {process_utils.py:75} INFO - Sending the signal 
Signals.SIGTERM to group 4938
   .
   [2022-03-07, 21:19:39 UTC] {logging_mixin.py:109} INFO - Running 
<TaskInstance: MY_DAG.MY_TASK scheduled__2022-03-07T12:01:00+00:00 [scheduled]> 
on host airflow-celery-worker-6bb54c97cb-c8wk7
   ...blah blah blah...
   [2022-03-07, 21:19:45 UTC] {local_task_job.py:212} WARNING - State of this 
instance has been externally set to up_for_retry. Terminating instance.
   [2022-03-07, 21:19:45 UTC] {process_utils.py:124} INFO - Sending 
Signals.SIGTERM to group 8554. PIDs of all processes in the group: [8554]
   [2022-03-07, 21:19:45 UTC] {process_utils.py:75} INFO - Sending the signal 
Signals.SIGTERM to group 8554
   ```
   
   ### What you expected to happen
   
   _No response_
   
   ### How to reproduce
   
   I can't repro this on demand, but it does happen about once a day 
specifically for this hourly task.
   
   ### Operating System
   
   Ubuntu 19.10
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   CeleryKubernetesExecutor
   Celery Queue: RabbitMQ
   2 schedulers
   
   worker_prefetch_multiplier is not set (defaults to 1)
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to