One small comment - in my current WIP multi-layered Docker file AIP-7 ( https://github.com/apache/airflow/pull/4543) I used dumb-init (developed by Yelp) rather than tini. It's a bit better for pythonic world as it can be installed from PIP: 'dumb-init>=1.2.2', ( https://github.com/apache/airflow/pull/4543/files#diff-2eeaed663bd0d25b7e608891384b7298R311 ),
and the usage is also rather straightforward: ENTRYPOINT ["/usr/local/bin/dumb-init", "--", "/entrypoint.sh"] ( https://github.com/apache/airflow/pull/4543/files#diff-3254677a7917c6c01f55212f86c57fbfR361 ) One small caveat I noticed so far with using proper "init" is that when you want to run interactive shell, you need to override --entrypoint - otherwise the shell has no terminal to use and complains. J. On Tue, Mar 12, 2019 at 4:41 AM Daniel Imberman <daniel.imber...@gmail.com> wrote: > Oh yikes. Thanks for catching this @Jarek! @Greg Neiheisel > <g...@astronomer.io> were there a lot of caveats to deploying from tini > or > would this be a fairly straightforward fix? > > On Sun, Mar 10, 2019 at 5:13 PM Greg Neiheisel <g...@astronomer.io> wrote: > > > I can confirm that wrapping airflow with tini/dumb-init works pretty > well. > > We've been relying on it at Astronomer for the past year. We run > > exclusively on k8s and are restarting airflow pods very frequently. > Here's > > our production 1.10.2 image using tini on alpine, for example > > > > > https://github.com/astronomer/astronomer/blob/master/docker/airflow/1.10.2/Dockerfile#L102 > > . > > > > On Sun, Mar 10, 2019 at 3:49 PM Jarek Potiuk <jarek.pot...@polidea.com> > > wrote: > > > > > NOTE! I changed the subject to not pollute the AIP-12 thread > > > > > > Fokko, > > > > > > I think I know why TERM signals do not work in the current POD > > > implementation. I already experienced that several times - dockerized > > app > > > not receiving TERM signal. The reason was always the same. It is not a > > bug > > > actually - it is expected behaviour in case your ENTRYPOINT is in the > > SHELL > > > form ("/binary arg1 arg2") or when you use shell script as ENTRYPOINT > > first > > > argument in [ "shell script", "arg" ] form. > > > > > > In those cases "shell" process becomes the "0" init process. > > Unfortunately > > > shell process is not prepared to do all the stuff that proper init > > process > > > should be doing: > > > > > > * Inherit orphaned child processes > > > * reap them > > > * Handle and propagate signals properly > > > * Wait until all subprocesses are terminated before terminating itself > > > > > > What happens is that the TERM signal just kills the init "shell" > process > > > but, then the signal does not reach any of its children and the > container > > > continues to run. It's well known problem in docker world and there > are a > > > number of solutions (including exec-ing from shell or - better - using > a > > > dumb-init/tini in your ENTRYPOINT- very tiny "proper" init > > implementations > > > that do what they should do. > > > > > > You can read more for example here: > > > https://hynek.me/articles/docker-signals > > > . > > > > > > J. > > > > > > On Sun, Mar 10, 2019 at 7:29 PM Driesprong, Fokko <fo...@driesprong.frl > > > > > wrote: > > > > > > Ps. Jarek, interesting idea. It shouldn't be too hard to make Airflow > > more > > > > k8s native. You could package your dags within your container, and > do a > > > > rolling update. Add the DAGs as the last layer, and then point the > DAGs > > > > folder to the same location. The hard part here is that you need to > > > > gracefuly restart the workers. Currently AFAIK the signals given to > the > > > pod > > > > aren't respected. So when the scheduler/webserver/worker receives a > > > > SIGTERM, it should stop the jobs nicely and then exit the container, > > > before > > > > k8s kills the container using a SIGKILL. This will be challenging > with > > > the > > > > workers, which they are potentially long-running. Maybe stop kicking > > off > > > > new jobs, and let the old ones finish, will be good enough, but then > we > > > > need to increase the standard kill timeout substantially. Having this > > > would > > > > also enable the autoscaling of the workers > > > > > > > > > > > > > -- > > > > > > Jarek Potiuk > > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > > > M: +48 660 796 129 <+48660796129> > > > E: jarek.pot...@polidea.com > > > > > > > > > -- > > *Greg Neiheisel* / CTO Astronomer.io > > > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> E: jarek.pot...@polidea.com