potiuk commented on issue #28941: URL: https://github.com/apache/airflow/issues/28941#issuecomment-1384493990
Then it nees to be looked up by somoene who knows celery and redis more than I do. I do not know redis that much - I guess there are delays in processing and saving the stored data. If you kill redis abruptly by kill -9 for example , then (obviously like any other software) it might loose some data that it keeps in memory and absolutely nothing can be done about it. There will be hangning tasks in this case which you will have to clear. That's the usual recovery mechanism from catastrophic failures. No system in the world can be made resilient to it really unless you do a lot of operational overhead and redundancy (and if you would like to do that, then it is more of a deployment issue). I think you should make sure that you are stopping redis in the "gentle" way that gives it a chance to flush everything to the disk and make sure that is actually restoring it from there Please then open a new issue with Helm chart and ideally showing all the logs (incuding debug logs showing redis storing and restoring the data - to make sure that it actually happens). If you can reproduce that knowng tha tredis is storing/restoring the queue, then I think that's something that somone who is a celery expert should take a look at so it's worth opening an issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
