potiuk commented on issue #24731: URL: https://github.com/apache/airflow/issues/24731#issuecomment-1322552034
BTW. I believe there is something very wrong with your restarting scenario and configuration in general - some mistakes or misunderstanding on how image entrypoint works. ``` ERROR: Pidfile (/opt/airflow/airflow-worker.pid) already exists. Seems we're already running? (pid: 1) ``` I think there are some things you are doing wrong here and they compounded 1) seems that you run airflow as init process in your container. This is possible but you need to realise the consequences of signal propagation and do it properly. You might fall into many traps of it if you are doing it wrongly so I recommend you to read why in airflow image we use dumb-init as init process and what consequences it has (especially for celery): https://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation The .pid file should only contain '1' if your process is started as "init" process - and this means that container will be killed when your process is running. When you use dumb-init as we do by default in our image, the dumb-init has process id 1, but in your case your airflow process will always has process id 1 and that is original root cause of the problem you have. 2) then the problem is most likely that you write a .pid to a shared volume which makes the pid file remain after killing the container. This is very, very, very, very wrong. If you rely on restarting the container and your process has PID = 1, you should never save the .pid file in the shared volume that can survive the container. Because you will get the exact problem you have. Your airflow webserver will always start as init process with PID =1. So even if the process has been killed, just the fact of restarting it will create a process ID 1 so airflow is really checking the PID file created by the previous "1" process with itself (which runs with PID=1) and it will never start. This is very much against the container philosophy. It should always be store in the ephemeral container volume, so that when your containers is stopped, the .pid file is gone. Make sure that you do not keep the .pid file in a shared volume - especially if you run your airflow command as entrypoint, because indeed, if you run In general, if you restart whole containers rather than processes, the .pid should NEVER be stored in a shared volume - it should always be stored in the ephemeral container volume so that it gets automatically deleted when whole container gets killed. So I think you should really rethink the way entrypoint works in your images, the way you store the .pid files get created and the way how restart process of failed container works - seems like all the three points are custom-done by you and they compound to the problem you experience. When you are using docker-compose approach, you need to reaise how this all works, how those elements interact and how to make it production-robust. Seems that you have chosen pretty hard path to walk, and going the beaten Helm + Kubernetes path without diverging too much from the approach we proposed, would have solved most of it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
