george-zubrienko opened a new issue #13677: URL: https://github.com/apache/airflow/issues/13677
Hi, This code https://github.com/apache/airflow/blob/1d2977f6a4c67fa6174c79dcdc4e9ee3ce06f1b1/chart/templates/scheduler/scheduler-deployment.yaml#L138 causes scheduler pods to randomly restart due to liveliness probe hitting random hostname, if more than one scheduler replica is running. Suggesting this change: ``` livenessProbe: exec: command: - python - '-Wignore' - '-c' - > import sys import os from airflow.jobs.scheduler_job import SchedulerJob from airflow.utils.session import provide_session from airflow.utils.state import State from airflow.utils.netimport get_hostname @provide_session def all_running_jobs(session=None): return session.query(SchedulerJob).filter(SchedulerJob.state == State.RUNNING).all() os.environ['AIRFLOW__CORE__LOGGING_LEVEL'] = 'ERROR' os.environ['AIRFLOW__LOGGING__LOGGING_LEVEL'] = 'ERROR' all_active_schedulers = all_running_jobs() current_scheduler = get_hostname() for _job in all_active_schedulers: if _job.hostname == current_scheduler and _job.is_alive(): sys.exit(0) sys.exit(1) ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
