potiuk commented on PR #60804: URL: https://github.com/apache/airflow/pull/60804#issuecomment-3777576754
> It’s surprising that the API server without cache eviction can cause system instability. One question. In uvicorn in Airflow 2 we had a way simpler solution. Simply the uvicorn servers have restarted every few (tens?) of minutes or every N requests - effectively cleaning the cache and also getting rid of some other side effects (and for example reloading UI plugins). Since api-server (except the cache) is essentially stateless, that did not have almost any negative side effects - except some load caused on the startup time and database refreshing happening then, but that's not much different than the caching implemented here provides. Additionally that approach was far more "resilient" to any kinds of accumulation-type bugs, yes it was hiding them as well, but the overall stability and resilience to any kind of mistakes made with memory usage, or side-effects of imports or global state sharung was eventually high-up. This approach is named "software rejuvenation" https://ieeexplore.ieee.org/document/466961 - there are some studies and recommendations to use it as it is effectively way more resilient and in complex systems it allows to handle much wide range of issues. Maybe we should explore that as well - I am not sure if fast-api/starlette has similar concept, but in case of all kinds of stateless webseerves, the techique of restarting them gracefully while load-balancing requests has a long proven history. Should we possibly do it instead of caching LRU/TTL ? That seems way more robust if this is easy and supported by Fast API -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
