droppoint commented on issue #32928: URL: https://github.com/apache/airflow/issues/32928#issuecomment-1864543583
@dirrao This is cool, but I already submitted PR #35800 almost a month ago. I think we are fixing different problems. Your PR addresses the issue when adoption is performed on a live scheduler that just skipped the heartbeat. We encountered this problem, but we resolved it by simply increasing the scheduler_health_check_threshold. However, there is another problem in the adoption cycle. The "adoption" of completed pods is unconditional, so even if the scheduler didn't skip the heartbeat, another scheduler will try to adopt "completed" pods from it. This results in a bloated running set. For more information, you can read our analysis of the situation here (https://github.com/apache/airflow/issues/32928#issuecomment-1820413530). Please feel free to check out our PR. P.S. Can you share the secret of how you managed to get a review from the maintainer so fast? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
