droppoint commented on issue #32928:
URL: https://github.com/apache/airflow/issues/32928#issuecomment-1864543583

   @dirrao This is cool, but I already submitted PR #35800 almost a month ago. 
I think we are fixing different problems. Your PR addresses the issue when 
adoption is performed on a live scheduler that just skipped the heartbeat. We 
encountered this problem, but we resolved it by simply increasing the 
scheduler_health_check_threshold. However, there is another problem in the 
adoption cycle. The "adoption" of completed pods is unconditional, so even if 
the scheduler didn't skip the heartbeat, another scheduler will try to adopt 
"completed" pods from it. This results in a bloated running set. For more 
information, you can read our analysis of the situation here 
(https://github.com/apache/airflow/issues/32928#issuecomment-1820413530). 
Please feel free to check out our PR.
   
   P.S. Can you share the secret of how you managed to get a review from the 
maintainer so fast?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to