anu251989 opened a new issue, #24498: URL: https://github.com/apache/airflow/issues/24498
### Apache Airflow version 2.2.5 ### What happened The airflow-redis-master pod deployment happened after celery worker pods deployment then Worker pods not able to process any tasks until manually restarted the worker pods. I have killed the airflow-redis-master pod and it is disconnected with worker pods and worker pods stop processing tasks until manually restarted the worker pods. in logs could see, missed heartbeat from another worker pod. we are facing this issue in 2.2.5 version and didn't face this issue in 1.10.12 version. in logs missed heartbeat is the last message. [2022-05-30 17:53:18,213: INFO/MainProcess] Connected to redis://:**@airflow-redis-master.auto1.svc.cluster.local:6379/1 [2022-05-30 17:53:18,228: INFO/MainProcess] mingle: searching for neighbors [2022-05-30 17:53:19,239: INFO/MainProcess] mingle: all alone [2022-05-30 17:53:24,246: INFO/MainProcess] missed heartbeat from celery@airflow-worker-0 we have updated below config but didn't work. AIRFLOW__CELERY_BROKER_TRANSPORT_OPTIONS__MAX_RETRIES=6 AIRFLOW__CELERY_BROKER_CONNECTION_TIMEOUT=60 AIRFLOW_CELERY_BROKER_HEARTBEAT=360 broker_connection_timeout= we have configured the liveness checks for worker pod for workaround with below command. celery --app airflow.executors.celery_executor.app inspect ping but the pods are restarting if all worker nodes are health checks failed. if any one of the worker health check failed. the liveness probes are considering as healthy as it is getting response from healthy worker pod. ### What you think should happen instead The worker pods has to resume the connection with airflow-redis-master node after redis pod up. ### How to reproduce please delete the airflow-redis-master pod and monitor the worker logs. after sometime you can see missed heartbeat in logs and worker pods not able to process any tasks. ### Operating System "Debian GNU/Linux 10 (buster)" ### Versions of Apache Airflow Providers apache-airflow-providers-amazon==3.2.0 apache-airflow-providers-celery==2.1.3 apache-airflow-providers-cncf-kubernetes==3.0.0 apache-airflow-providers-docker==2.5.2 apache-airflow-providers-elasticsearch==2.2.0 apache-airflow-providers-ftp==2.1.2 apache-airflow-providers-google==6.7.0 apache-airflow-providers-grpc==2.0.4 apache-airflow-providers-hashicorp==2.1.4 apache-airflow-providers-http==2.1.2 apache-airflow-providers-imap==2.2.3 apache-airflow-providers-microsoft-azure==3.7.2 apache-airflow-providers-mysql==2.2.3 apache-airflow-providers-odbc==2.0.4 apache-airflow-providers-postgres==4.1.0 apache-airflow-providers-redis==2.0.4 apache-airflow-providers-sendgrid==2.0.4 apache-airflow-providers-sftp==2.5.2 apache-airflow-providers-slack==4.2.3 apache-airflow-providers-sqlite==2.1.3 apache-airflow-providers-ssh==2.4.3 ### Deployment Official Apache Airflow Helm Chart ### Deployment details https://github.com/apache/airflow ### Anything else _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
