potiuk commented on PR #60804:
URL: https://github.com/apache/airflow/pull/60804#issuecomment-3777576754

   > It’s surprising that the API server without cache eviction can cause 
system instability.
   
   One question. In uvicorn in Airflow 2 we had a way simpler solution. Simply 
the uvicorn servers have restarted every few (tens?) of minutes or every N 
requests - effectively cleaning the cache and also getting rid of some other 
side effects (and for example reloading UI plugins). Since api-server (except 
the cache) is essentially stateless, that did not have almost any negative side 
effects - except some load caused on the startup time and database refreshing 
happening then, but that's not much different than the caching implemented here 
provides. 
   
   Additionally that approach was far more "resilient" to any kinds of 
accumulation-type bugs, yes it was hiding them as well, but the overall 
stability and resilience to any kind of mistakes made with memory usage, or 
side-effects of imports or global state sharung was eventually high-up.
   
   This approach is named "software rejuvenation" 
https://ieeexplore.ieee.org/document/466961 - there are some studies and 
recommendations to use it as it is effectively way more resilient and in 
complex systems it allows to handle much wide range of issues. 
   
   Maybe we should explore that as well - I am not sure if fast-api/starlette 
has similar concept, but in case of all kinds of stateless webseerves, the 
techique of restarting them gracefully while load-balancing requests has a long 
proven history.
   
   Should we possibly do it instead of caching LRU/TTL ? That seems way more 
robust if this is easy and supported by Fast API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to