teastburn commented on issue #7935:
URL: https://github.com/apache/airflow/issues/7935#issuecomment-708587822


   We have a change that correlates (causation is not yet verified) to fixing 
the issue the @sylr mentioned 
[here](https://github.com/apache/airflow/issues/7935#issuecomment-667343505) 
where many scheduler main processes spawn at the same time then disappear 
(which [caused an OOM error for 
us](https://github.com/apache/airflow/issues/11365)). 
   
   The change was the following:
   ```
   AIRFLOW__CORE__SQL_ALCHEMY_POOL_SIZE
   - 5
   + 11
   AIRFLOW__CORE__SQL_ALCHEMY_MAX_OVERFLOW
   - 10
   + 30
   AIRFLOW__CORE__SQL_ALCHEMY_POOL_RECYCLE
   - 3600
   + 1800
   ```
   And we run MAX_THREADS=10. Is it possible that reaching pool_size or 
pool_size+max_overflow caused processes to back up or spawn oddly? Before this 
change, the scheduler was getting stuck 1-2 times per day, now we have not seen 
this issue since the change 6 days ago.
   
   <details><summary>We do not see the issue of many processes spawning at once 
anymore like this:</summary>
   
   <p>
   
   ```
   $ while true; do pgrep -f 'airflow scheduler' | wc -l; sleep .5; done
   39
   4
   4
   4
   39
   39
   39
   39
   39
   5
   5
   5
   5
   5
   5
   5
   3
   3
   3
   38
   3
   3
   2
   2
   2
   2
   2
   37
   2
   2
   2
   2
   2
   2
   2
   7
   2
   8
   3
   8
   2
   4
   3
   3
   3
   3
   2
   2
   2
   2
   2
   2
   2
   2
   4
   3
   3
   3
   9
   3
   3
   3
   13
   3
   3
   3
   17
   2
   2
   2
   2
   2
   2
   2
   24
   2
   2
   4
   ```
   
   </p>
   
   </details>
   
   Can anyone else verify this change helps or not?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to