potiuk commented on code in PR #60919:
URL: https://github.com/apache/airflow/pull/60919#discussion_r2717204287


##########
airflow-core/src/airflow/config_templates/config.yml:
##########
@@ -1446,6 +1446,24 @@ api:
       type: string
       example: ~
       default: "False"
+    worker_refresh_interval:
+      description: |
+        Number of seconds between rolling worker refreshes for the API server.
+        When enabled, workers are periodically restarted using a rolling 
restart
+        pattern (new workers spawn before old ones are killed) to prevent 
memory
+        accumulation while maintaining zero downtime. Set to 0 to disable.
+      version_added: 3.2.0
+      type: integer
+      example: ~
+      default: "0"

Review Comment:
   Should we use some more robust default here? 
   
   I guess we could have min 2 workers **by default** and have rolling restarts 
enabled by default, and refresh interval set to some default, reasonable  value 
(say 1hr).
   
   While it increases memory used as explained in the PR description, it is far 
more robust for "out-of-the-box" installation, and if users would like to save 
memory by limiting it to 1, they still should be able to do so (and then we 
could disable the uvicorn restarts altogether). I think it's more important to 
have "robust non leaking, gently  restarting" by default, rather than 
"potentially leaking smaller initial memory".
   
   Since Fast API recommendation is 1 worker per Pod - we could also recommend 
to the users - in the same docs -  to use the rolling restarts of deployment 
instead those restarts that  are handled by Uvicorn monitor. And we could add 
description to the number of workers paraneter, that setting it to 1 is ok as 
long as you have more than one instance of api-server and implement refreshing 
rolling restart outside of Airlfow. This way, the users who would like to 
change it to 1 (and read the docs that is) - would know that they also need to 
take care about potential leaks. 
   
   There are even some ways to do it automatically - which we could implement 
in our Helm Chart as an option potentially:
   
   
https://stackoverflow.com/questions/67423919/k8s-cronjob-rolling-restart-every-day
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to