potiuk commented on issue #24731: URL: https://github.com/apache/airflow/issues/24731#issuecomment-1322343146
No idea how your liveness probe works. But generally all software that manages another software running (i.e deployment like kubernetes) have the usual sequence of events: - check if the software is runninf and responding on some kind of liveness probe (see how liveness probe is defined in our Helm Chart for example - when the liveness probe fails for some time (usually several times) then it announces the component and attempts to stop it - usually it happens via sigterm and other 'soft" signals that allow the componente to "kill itself" and clean up (usually if the software is able to shutdown itself cleanly it will remove all the "pid" and the like - when it does not succeed it wll escalate the signal (SIGTERM -> SIGHUP -> SIGKILL) giving the process time to actualy react and clean-up. SIGKILL is not possible to handle, it shuts down the process immediately and some stuff (like .pid) remain ONLY AFTER that SEQUENCE knowingt that your component process is down, the "restart" should happen if this is fulfilled - the woker.pid is not deteled does not matter because the process is not running any more (at most it was SIGKILLED). When airflow starts next time and the .pid is not deleted it will check if process specified in the .pid is running and if not, it will delete the pid file and run. Only when the process in .pid is still running, it will refuse to start. And this is a general advice. This is how .pid file works for any process. Nothing Airlfow-specific. All software should be managed this way. I have no idea how docker-compose and killing works but it shoudl do the same and you should configure docker compose to this in exactly this way (this is what for example Kubernetes does). But you should lool at the internals of docker-compose behaviour when restarting airflow in such case. I honestly don't know how to do it with docker compose. Maybe it is possible, maybe not, maybe it requires some tricks to make it works. I personally think of docker-compose like a very poor deployment that lacks a lot of features and a lot of stability that "real" production deployment like Kubernetes does. In my opinion it lacks some of the automation and some of the deployment features - precisely the kind you obeserve, when you want to do some "real production stuff" with the software. Maybe it is because I do not know it, maybe because it is hard, maybe because it is impossible. But I believe it is a very poor cousing of K8S when it comes to running "real/serious" production deployments. When you are choosing it, you take the responsibility on you as deployment manager to sometimes do manual recovery where docker-compose wil not let you do this. It's one of the responsibilities you take on your shoulders. And we as community decided not to spend our time on making a "production-ready" docker-compose deployment - because we know this is not something we know what advices to give and that those who decide to go this path have to solve them on their own in the way it is best for them. Contrary to that, the "Helm Chart" which we maintain and are able to solve a lot of those problems (including liveness probes, restarts etc.). It is much closer to something that runs 'out-of-the-box" - once you have resources sorted out, a lot of the management is handled for you by helm/kubernetes combo we prepared. I am afraid you made the choice to use docker-compose. We warned the one we have is not suitable for production (it's a quick-start) and it requires a lot of work to make it so and you need to become docker-compose expert to solve them. Also you can take a look here, where we explain what kind of skils you need to have: https://airflow.apache.org/docs/apache-airflow/stable/installation/index.html#using-production-docker-images If you want to stick with docker-compose - good luck, you will have a lot of things like that. If you find some solutions - you can contribute it back to our docs as "good practices" (but we will never turn it into "this is how you run docker-compose deployment" as this is impossible to make into a general set of advices - at most this might be some advice - "if you get into this trouble -> maybe this solution will work"). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
