[GitHub] [airflow] potiuk commented on issue #24731: Celery Executor : After killing Redis or Airflow Worker Pod, queued Tasks not getting executed even after pod is up.

GitBox Mon, 21 Nov 2022 08:37:42 -0800


potiuk commented on issue #24731:
URL: https://github.com/apache/airflow/issues/24731#issuecomment-1322343146

No idea how your liveness probe works. But generally all software that
manages another software running (i.e deployment like kubernetes) have the
usual sequence of events:

- check if the software is runninf and responding on some kind of liveness
probe (see how liveness probe is defined in our Helm Chart for example
- when the liveness probe fails for some time (usually several times) then
it announces the component and attempts to stop it
- usually it happens via sigterm and other 'soft" signals that allow the
componente to "kill itself" and clean up (usually if the software is able to
shutdown itself cleanly it will remove all the "pid" and the like
- when it does not succeed it wll escalate the signal (SIGTERM -> SIGHUP ->
SIGKILL) giving the process time to actualy react and clean-up. SIGKILL is not
possible to handle, it shuts down the process immediately and some stuff (like
.pid) remain

ONLY AFTER that SEQUENCE knowingt that your component process is down, the
"restart" should happen

if this is fulfilled - the woker.pid is not deteled does not matter because
the process is not running any more (at most it was SIGKILLED). When airflow
starts next time and the .pid is not deleted it will check if process specified
in the .pid is running and if not, it will delete the pid file and run. Only
when the process in .pid is still running, it will refuse to start.

And this is a general advice. This is how .pid file works for any process.
Nothing Airlfow-specific. All software should be managed this way.

I have no idea how docker-compose and killing works but it shoudl do the
same and you should configure docker compose to this in exactly this way (this
is what for example Kubernetes does). But you should lool at the internals of
docker-compose behaviour when restarting airflow in such case. I honestly don't
know how to do it with docker compose. Maybe it is possible, maybe not, maybe
it requires some tricks to make it works.

I personally think of docker-compose like a very poor deployment that lacks
a lot of features and a lot of stability that "real" production deployment like
Kubernetes does. In my opinion it lacks some of the automation and some of the
deployment features - precisely the kind you obeserve, when you want to do
some "real production stuff" with the software. Maybe it is because I do not
know it, maybe because it is hard, maybe because it is impossible. But I
believe it is a very poor cousing of K8S when it comes to running
"real/serious" production deployments. When you are choosing it, you take the
responsibility on you as deployment manager to sometimes do manual recovery
where docker-compose wil not let you do this. It's one of the responsibilities
you take on your shoulders.

And we as community decided not to spend our time on making a
"production-ready" docker-compose deployment - because we know this is not
something we know what advices to give and that those who decide to go this
path have to solve them on their own in the way it is best for them.

Contrary to that, the "Helm Chart" which we maintain and are able to solve a
lot of those problems (including liveness probes, restarts etc.). It is much
closer to something that runs 'out-of-the-box" - once you have resources sorted
out, a lot of the management is handled for you by helm/kubernetes combo we
prepared.

I am afraid you made the choice to use docker-compose. We warned the one we
have is not suitable for production (it's a quick-start) and it requires a lot
of work to make it so and you need to become docker-compose expert to solve
them.

Also you can take a look here, where we explain what kind of skils you need
to have:

https://airflow.apache.org/docs/apache-airflow/stable/installation/index.html#using-production-docker-images

If you want to stick with docker-compose - good luck, you will have a lot
of things like that. If you find some solutions - you can contribute it back to
our docs as "good practices" (but we will never turn it into "this is how you
run docker-compose deployment" as this is impossible to make into a general set
of advices - at most this might be some advice - "if you get into this trouble
-> maybe this solution will work").

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] potiuk commented on issue #24731: Celery Executor : After killing Redis or Airflow Worker Pod, queued Tasks not getting executed even after pod is up.

Reply via email to