anu251989 opened a new issue, #24498:
URL: https://github.com/apache/airflow/issues/24498

   ### Apache Airflow version
   
   2.2.5
   
   ### What happened
   
   The airflow-redis-master pod deployment happened after celery worker pods 
deployment then Worker pods not able to process any tasks until manually 
restarted the worker pods. I have killed the airflow-redis-master pod and it is 
disconnected with worker pods and worker pods stop processing tasks until 
manually restarted the worker pods.
   
   in logs could see, missed heartbeat from another worker pod. we are facing 
this issue in 2.2.5 version and didn't face this issue in 1.10.12 version.
   
   in logs missed heartbeat is the last message.
   
   [2022-05-30 17:53:18,213: INFO/MainProcess] Connected to 
redis://:**@airflow-redis-master.auto1.svc.cluster.local:6379/1
   [2022-05-30 17:53:18,228: INFO/MainProcess] mingle: searching for neighbors
   [2022-05-30 17:53:19,239: INFO/MainProcess] mingle: all alone
   [2022-05-30 17:53:24,246: INFO/MainProcess] missed heartbeat from 
celery@airflow-worker-0
   
   we have updated below config but didn't work.
   
   AIRFLOW__CELERY_BROKER_TRANSPORT_OPTIONS__MAX_RETRIES=6
   AIRFLOW__CELERY_BROKER_CONNECTION_TIMEOUT=60
   AIRFLOW_CELERY_BROKER_HEARTBEAT=360
   
   broker_connection_timeout=
   
   we have configured the liveness checks for worker pod for workaround with 
below command.
   celery --app airflow.executors.celery_executor.app inspect ping
   but the pods are restarting if all worker nodes are health checks failed. if 
any one of the worker health check failed. the liveness probes are considering 
as healthy as it is getting response from healthy worker pod.
   
   
   
   
   
   ### What you think should happen instead
   
   The worker pods has to resume the connection with airflow-redis-master node 
after redis pod up.
   
   ### How to reproduce
   
   please delete the airflow-redis-master pod and monitor the worker logs. 
after sometime you can see missed heartbeat in logs and worker pods not able to 
process any tasks.
   
   ### Operating System
   
   "Debian GNU/Linux 10 (buster)"
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon==3.2.0
   apache-airflow-providers-celery==2.1.3
   apache-airflow-providers-cncf-kubernetes==3.0.0
   apache-airflow-providers-docker==2.5.2
   apache-airflow-providers-elasticsearch==2.2.0
   apache-airflow-providers-ftp==2.1.2
   apache-airflow-providers-google==6.7.0
   apache-airflow-providers-grpc==2.0.4
   apache-airflow-providers-hashicorp==2.1.4
   apache-airflow-providers-http==2.1.2
   apache-airflow-providers-imap==2.2.3
   apache-airflow-providers-microsoft-azure==3.7.2
   apache-airflow-providers-mysql==2.2.3
   apache-airflow-providers-odbc==2.0.4
   apache-airflow-providers-postgres==4.1.0
   apache-airflow-providers-redis==2.0.4
   apache-airflow-providers-sendgrid==2.0.4
   apache-airflow-providers-sftp==2.5.2
   apache-airflow-providers-slack==4.2.3
   apache-airflow-providers-sqlite==2.1.3
   apache-airflow-providers-ssh==2.4.3
   
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   https://github.com/apache/airflow
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to