pdellarciprete commented on issue #57618:
URL: https://github.com/apache/airflow/issues/57618#issuecomment-3691213811

   > [@pdellarciprete](https://github.com/pdellarciprete) Few questions:
   > 
   > 1. Is this only happening for KE? (don't think so based on code-paths but 
want to confirm)
   > 2. Could you upload logs from both schedulers when this happens? (full 
logs around the time of the race, not just excerpts)
   > 3. What is your `scheduler_health_check_threshold` setting? And how long 
do your tasks typically take to execute?
   > 4. In the Dec 15 logs showing the orphan reset - was Scheduler A actually 
unhealthy, or was it still running fine when Scheduler B marked its job as 
failed?
   
   Hello @kaxil , 
   
   1. I noticed the issue only in our KE instances, so I'd say yes.
   2. For the full logs I need to retrieve it, but I guess @ephraimbuddy, since 
was able to reproduce it should have an example as well.
   3. The `scheduler_health_check_threshold` is the default. 
   4. The scheduler A was healthy, but probably the SchedulerJob heartbeat was 
too old and it was marked as failed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to