potiuk commented on issue #24731:
URL: https://github.com/apache/airflow/issues/24731#issuecomment-1322343146

   No idea how your liveness probe works. But generally all software that 
manages another software running (i.e deployment like kubernetes) have the 
usual sequence of events:
   
   - check if the software is runninf and responding on some kind of liveness 
probe (see how liveness probe is defined in our Helm Chart for example
   - when the liveness probe fails for some time (usually several times) then 
it announces the component and attempts to stop it
   - usually it happens via sigterm and other 'soft" signals that allow the 
componente to "kill itself" and clean up (usually if the software is able to 
shutdown itself cleanly it will remove all the "pid" and the like
   - when it does not succeed it wll escalate the signal (SIGTERM -> SIGHUP -> 
SIGKILL) giving the process time to actualy react and clean-up. SIGKILL is not 
possible to handle, it shuts down the process immediately and some stuff (like 
.pid) remain
   
   
   ONLY AFTER that SEQUENCE knowingt that your component process is down, the 
"restart" should happen
   
   if this is fulfilled - the woker.pid is not deteled does not matter because 
the process is not running any more (at most it was SIGKILLED). When airflow 
starts next time and the .pid is not deleted it will check if process specified 
in the .pid is running and if not, it will delete the pid file and run. Only 
when the process in .pid is still running, it will refuse to start.
   
   And this is a general advice. This is how .pid file works for any process. 
Nothing Airlfow-specific. All software should be managed this way. 
   
   I have no idea how docker-compose and killing works but it shoudl do the 
same and you should configure docker compose to this in exactly this way (this 
is what for example Kubernetes does). But you should lool at the internals of 
docker-compose behaviour when restarting airflow in such case. I honestly don't 
know how to do it with docker compose. Maybe it is possible, maybe not, maybe 
it requires some tricks to make it works.
   
   I personally think of docker-compose like a very poor deployment that lacks 
a lot of features and a lot of stability that "real" production deployment like 
Kubernetes does. In my opinion it lacks some of the automation and some of the 
deployment features -  precisely the kind you obeserve, when you want to do 
some "real production stuff" with the software. Maybe it is because I do not 
know it, maybe because it is hard, maybe because it is impossible.  But I 
believe it is a very poor cousing of K8S when it comes to running 
"real/serious" production deployments. When you are choosing it, you take the 
responsibility on you as deployment manager to sometimes do manual recovery 
where docker-compose wil not let you do this. It's one of the responsibilities 
you take on your shoulders. 
   
   And we as community decided not to spend our time on making a 
"production-ready" docker-compose deployment - because we know this is not 
something we know what advices to give and that those who decide to go this 
path have to solve them on their own in the way it is best for them. 
   
   Contrary to that, the "Helm Chart" which we maintain and are able to solve a 
lot of those problems (including liveness probes, restarts etc.). It is much 
closer to something that runs 'out-of-the-box" - once you have resources sorted 
out, a lot of the management is handled for you by helm/kubernetes combo we 
prepared.
   
   I am afraid you made the choice to use docker-compose. We warned the one we 
have is not suitable for production (it's a quick-start) and it requires a lot 
of work to make it so and you need to become docker-compose expert to solve 
them.
   
   Also you can take a look here, where we explain what kind of skils you need 
to have:
   
   
https://airflow.apache.org/docs/apache-airflow/stable/installation/index.html#using-production-docker-images
   
   If you want to stick with docker-compose - good luck, you will have a  lot 
of things like that. If you find some solutions - you can contribute it back to 
our docs as "good practices" (but we will never turn it into "this is how you 
run docker-compose deployment" as this is impossible to make into a general set 
of advices - at most this might be some advice - "if you get into this trouble 
-> maybe this solution will work"). 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to