teastburn commented on issue #7935: URL: https://github.com/apache/airflow/issues/7935#issuecomment-708587822
We have a change that correlates (causation is not yet verified) to fixing the issue the @sylr mentioned [here](https://github.com/apache/airflow/issues/7935#issuecomment-667343505) where many scheduler main processes spawn at the same time then disappear (which [caused an OOM error for us](https://github.com/apache/airflow/issues/11365)). The change was the following: ``` AIRFLOW__CORE__SQL_ALCHEMY_POOL_SIZE - 5 + 11 AIRFLOW__CORE__SQL_ALCHEMY_MAX_OVERFLOW - 10 + 30 AIRFLOW__CORE__SQL_ALCHEMY_POOL_RECYCLE - 3600 + 1800 ``` And we run MAX_THREADS=10. Is it possible that reaching pool_size or pool_size+max_overflow caused processes to back up or spawn oddly? Before this change, the scheduler was getting stuck 1-2 times per day, now we have not seen this issue since the change 6 days ago. <details><summary>We do not see the issue of many processes spawning at once anymore like this:</summary> <p> ``` $ while true; do pgrep -f 'airflow scheduler' | wc -l; sleep .5; done 39 4 4 4 39 39 39 39 39 5 5 5 5 5 5 5 3 3 3 38 3 3 2 2 2 2 2 37 2 2 2 2 2 2 2 7 2 8 3 8 2 4 3 3 3 3 2 2 2 2 2 2 2 2 4 3 3 3 9 3 3 3 13 3 3 3 17 2 2 2 2 2 2 2 24 2 2 4 ``` </p> </details> Can anyone else verify this change helps or not? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
