easontm opened a new issue #22068:
URL: https://github.com/apache/airflow/issues/22068
### Apache Airflow version
2.2.3
### What happened
As far as I can tell, this is the series of events:
1. Worker c8 receives the message
2. Worker c8 starts processing
3. Worker hl receives a separate message with the same payload but
incremented try#
4. Worker c8 experiences hostname mismatch
Scheduler logs
```
2022-03-07T21:19:40.926Z scheduler2 Setting external_id for
<TaskInstance: MY_DAG.MY_TASK scheduled__2022-03-07T12:01:00+00:00 [queued]> to
8e41d584-1775-4b3f-8621-7fb379249b35
2022-03-07T21:19:40.926Z scheduler2 Adding to queue: ['airflow',
'tasks', 'run', 'MY_DAG', 'MY_TASK', 'scheduled__2022-03-07T12:01:00+00:00',
'--local', '--subdir', 'DAGS_FOLDER/MY_DAG.py']
2022-03-07T21:19:40.926Z scheduler2 Sending
TaskInstanceKey(dag_id='MY_DAG', task_id='MY_TASK',
run_id='scheduled__2022-03-07T12:01:00+00:00', try_number=2) to executor with
priority 4 and queue default
2022-03-07T21:19:39.042Z scheduler1 Setting external_id for
<TaskInstance: MY_DAG.MY_TASK scheduled__2022-03-07T12:01:00+00:00 [scheduled]>
to b742838e-81b0-4bd6-87d7-eb910f3c787c
2022-03-07T21:19:37.262Z scheduler1 Adding to queue: ['airflow',
'tasks', 'run', 'MY_DAG', 'MY_TASK', 'scheduled__2022-03-07T12:01:00+00:00',
'--local', '--subdir', 'DAGS_FOLDER/MY_DAG.py']
2022-03-07T21:19:37.262Z scheduler1 Sending
TaskInstanceKey(dag_id='MY_DAG', task_id='MY_TASK',
run_id='scheduled__2022-03-07T12:01:00+00:00', try_number=1) to executor with
priority 4 and queue default
```
Messages, truncated/formatted for reading (from flower)
```
FAILED b742838e-81b0-4bd6-87d7-eb910f3c787c 'MY_DAG', 'MY_TASK',
'scheduled__2022-03-07T12:01:00+00:00' 2022-03-07 21:19:36.975 Worker-c8
SUCCEEDED 8e41d584-1775-4b3f-8621-7fb379249b35 'MY_DAG', 'MY_TASK',
'scheduled__2022-03-07T12:01:00+00:00' 2022-03-07 21:19:40.150 Worker-hl
```
Worker c8
```
[2022-03-07 21:19:36,983: INFO/ForkPoolWorker-2] Celery task ID:
b742838e-81b0-4bd6-87d7-eb910f3c787c
[2022-03-07 21:19:37,401: WARNING/ForkPoolWorker-2] Running <TaskInstance:
MY_DAG.MY_TASK scheduled__2022-03-07T12:01:00+00:00 [queued]> on host
airflow-celery-worker-6bb54c97cb-c8wk7
[2022-03-07 21:19:36,983: INFO/ForkPoolWorker-2] Executing command in
Celery: ['airflow', 'tasks', 'run', 'MY_DAG', 'MY_TASK',
'scheduled__2022-03-07T12:01:00+00:00', '--local', '--subdir',
'DAGS_FOLDER/MY_DAG.py']
[2022-03-07 21:19:44,642: ERROR/ForkPoolWorker-2] Failed to execute task
Hostname of job runner does not match.
```
Worker hl
```
[2022-03-07 21:19:40,158: INFO/ForkPoolWorker-16] Celery task ID:
8e41d584-1775-4b3f-8621-7fb379249b35
[2022-03-07 21:19:40,558: WARNING/ForkPoolWorker-16] Running <TaskInstance:
MY_DAG.MY_TASK scheduled__2022-03-07T12:01:00+00:00 [queued]> on host
airflow-celery-worker-6bb54c97cb-hll5d
[2022-03-07 21:19:40,158: INFO/ForkPoolWorker-16] Executing command in
Celery: ['airflow', 'tasks', 'run', 'MY_DAG', 'MY_TASK',
'scheduled__2022-03-07T12:01:00+00:00', '--local', '--subdir',
'DAGS_FOLDER/MY_DAG.py']
Then, there are two task instance logs:
[2022-03-07, 21:19:38 UTC] {standard_task_runner.py:76} INFO - Running:
['airflow', 'tasks', 'run', 'MY_DAG', 'MY_TASK',
'scheduled__2022-03-07T12:01:00+00:00', '--job-id', '6537153', '--raw',
'--subdir', 'DAGS_FOLDER/MY_DAG.py', '--cfg-path', '/tmp/tmpzubhcaom',
'--error-file', '/tmp/tmpuvg0i7va']
[2022-03-07, 21:19:38 UTC] {standard_task_runner.py:77} INFO - Job 6537153:
Subtask MY_TASK
[2022-03-07, 21:19:43 UTC] {local_task_job.py:194} WARNING - The recorded
hostname airflow-celery-worker-6bb54c97cb-hll5d does not match this instance's
hostname airflow-celery-worker-6bb54c97cb-c8wk7
[2022-03-07, 21:19:43 UTC] {process_utils.py:124} INFO - Sending
Signals.SIGTERM to group 4938. PIDs of all processes in the group: [4938]
[2022-03-07, 21:19:43 UTC] {process_utils.py:75} INFO - Sending the signal
Signals.SIGTERM to group 4938
.
[2022-03-07, 21:19:39 UTC] {logging_mixin.py:109} INFO - Running
<TaskInstance: MY_DAG.MY_TASK scheduled__2022-03-07T12:01:00+00:00 [scheduled]>
on host airflow-celery-worker-6bb54c97cb-c8wk7
...blah blah blah...
[2022-03-07, 21:19:45 UTC] {local_task_job.py:212} WARNING - State of this
instance has been externally set to up_for_retry. Terminating instance.
[2022-03-07, 21:19:45 UTC] {process_utils.py:124} INFO - Sending
Signals.SIGTERM to group 8554. PIDs of all processes in the group: [8554]
[2022-03-07, 21:19:45 UTC] {process_utils.py:75} INFO - Sending the signal
Signals.SIGTERM to group 8554
```
### What you expected to happen
_No response_
### How to reproduce
I can't repro this on demand, but it does happen about once a day
specifically for this hourly task.
### Operating System
Ubuntu 19.10
### Versions of Apache Airflow Providers
_No response_
### Deployment
Other Docker-based deployment
### Deployment details
CeleryKubernetesExecutor
Celery Queue: RabbitMQ
2 schedulers
worker_prefetch_multiplier is not set (defaults to 1)
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]