potiuk opened a new issue #14782:
URL: https://github.com/apache/airflow/issues/14782


   Hey @ashb -  following your request of observing the self-hosted behavior.
   
   I believe the current scale-in settings are slightly too aggressive I 
believe - especially when the traffic is low (weekends).
   
   I pushed a number of builds here (and @turbaszek as well):
   
   
https://github.com/apache/airflow/actions/workflows/build-images-workflow-run.yml
   
   And quite a number a lot of them failed without logs indicating the scale-in 
event happened. some of them with 'git' failed, some of them with explicitly 
`lost communication`
   
   Some example here: 
   
   * https://github.com/apache/airflow/actions/runs/651685483
   * https://github.com/apache/airflow/actions/runs/651569791
   * https://github.com/apache/airflow/actions/runs/651571502
   * https://github.com/apache/airflow/actions/runs/651550722
   * https://github.com/apache/airflow/actions/runs/651642721
   * https://github.com/apache/airflow/actions/runs/651688017
   
   (maybe some of those were cancelled as duplicates - but at most 1 or 2)
   
   At the same time a number of those jobs succeeded, so I think the scale-in 
events are the ones to blame.
   
   The previous setting was much more stable (but more costly as well) - 
however I think I will merge the #14531 which should sigificantly decrease the 
time needed from the runners so hopefully we will be able to tune up the 
scale-in settings so that they are more stable. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to