potiuk commented on issue #28941:
URL: https://github.com/apache/airflow/issues/28941#issuecomment-1384493990

   Then it nees to be looked up by somoene who knows celery and redis more than 
I do.  I do not know redis that much - I guess there are delays in processing 
and saving the stored data. If you kill redis abruptly by kill -9 for example , 
then (obviously like any other software) it might loose some data that it keeps 
in memory and absolutely nothing can be done about it. There will be hangning 
tasks in this case which you will have to clear. That's the usual recovery 
mechanism from catastrophic failures. No system in the world can be made 
resilient to it really unless you do a lot of operational overhead and 
redundancy (and if you would like to do that, then it is more of a deployment 
issue).
   
   I think you should make sure that you are stopping redis in the "gentle" way 
that gives it a chance to flush everything to the disk and make sure that is 
actually restoring it from there
   
   Please then open a new issue with Helm chart and ideally showing all the 
logs (incuding debug logs showing redis storing and restoring the data - to 
make sure that it actually happens). If you can reproduce that knowng tha 
tredis is storing/restoring the queue, then I think that's something that 
somone who is a celery expert should take a look at so it's worth opening an 
issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to