george-zubrienko opened a new issue #13677:
URL: https://github.com/apache/airflow/issues/13677


   Hi,
   This code
   
   
https://github.com/apache/airflow/blob/1d2977f6a4c67fa6174c79dcdc4e9ee3ce06f1b1/chart/templates/scheduler/scheduler-deployment.yaml#L138
   
   causes scheduler pods to randomly restart due to liveliness probe hitting 
random hostname, if more than one scheduler replica is running. Suggesting this 
change:
   
   ```
             livenessProbe:
               exec:
                 command:
                   - python
                   - '-Wignore'
                   - '-c'
                   - >
   
                     import sys
   
                     import os 
   
                     from airflow.jobs.scheduler_job import SchedulerJob
   
                     from airflow.utils.session import provide_session
   
                     from airflow.utils.state import State 
   
                     from airflow.utils.netimport get_hostname
   
                     @provide_session def all_running_jobs(session=None):
                         return 
session.query(SchedulerJob).filter(SchedulerJob.state == State.RUNNING).all()
                         
                     os.environ['AIRFLOW__CORE__LOGGING_LEVEL'] = 'ERROR'
                     os.environ['AIRFLOW__LOGGING__LOGGING_LEVEL'] = 'ERROR'
   
                     all_active_schedulers = all_running_jobs() 
   
                     current_scheduler = get_hostname()
   
                     for _job in all_active_schedulers:
                         if _job.hostname == current_scheduler and 
_job.is_alive():
                             sys.exit(0)
   
                     sys.exit(1)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to