On 14/08/2015 01:25, Colin Booth wrote:
I'm not sure how I feel about having the indestructibility guarantee residing in a service that isn't the root of the supervision tree. I haven't done much with s6-fdholderd but unless there's some extra magic going on in s6rc-fdholderd, if it goes down it won't be able to re-establish its control over the overall communications state due to it creating a fresh socket. I know, I know, it should be fine, but accidents happen.
I've thought about it for a while, and finally decided that the advantages overshadowed the drawbacks. First, the only time this makes a qualitative difference is when the pipe maintainer cannot die at all. In one setup, you lose your pipe when "s6-svscan" dies; in the other setup, you lose your pipes when "s6-fdholderd" dies. The only way to prevent that is to forbid your pipe maintainer from dying entirely. Second, the only way to do that is to put the pipe maintainer as process 1; but I don't think putting things in process 1 to make them indestructible is the answer. It's the systemd way. "We're process 1, so we cannot die, and we can do everything on the system that needs reliability." Granted, it's a nice thing to have, and I do advocate the use of s6-svscan as process 1, but not because it's a pipe maintainer. I use s6-svscan as process 1 because it's the natural place for the root of a supervision tree; and everything else is a bonus. The logged service feature of s6-svscan is a direct legacy of daemontools. It was very cool at the time because we had nothing else; and I keep it because there's a large daemontools user base, and breaking compatibility would not make sense because the code that handles logged services isn't complex enough to be a maintenance burden. (And still, it is one of the very few places where I had to write a detailed comment labelled BLACK MAGIC, because there *is* some complexity to it.) So it's not going away any time soon, but it's still a legacy ad-hoc functionality. If I was writing s6-svscan today, I would not implement this feature; I would advertise the use of a dedicated fd-holder instead. And that would cut the code size of s6-svscan by a non-negligible amount, getting it closer to the ideal of the minimal process 1. The correct approach to reliability is not to try and force processes not to die; and it's not to cram more stuff into the only process that cannot die. It's to make sure it's not a serious problem when processes die. And that, btw, is exactly what supervision is about in the first place. So, let's make sure it's not a problem when the pipe maintainer dies. In this case, let's add a watcher for s6-fdholderd. Instead of oneshots that store pipes into the s6-fdholderd, how about filling up s6-fdholderd at start time with all the pipes it needs ? The processes in a pipeline will keep using the old pipes until one of them dies, at which point the old pipe will close, propagating the EOF or EPIPE to the other processes in the pipeline; eventually all the processes in the pipeline will restart, and fetch the new set of pipes from s6-fdholderd. That sounds reliable to me, and even cleaner than the current approach, where the services can't reliably restart if s6-fdholderd has died; and it doesn't need additional autogenerated oneshots. (Thanks for the rubber duck debugging! That's a huge part of why I like design discussions.) So yeah, if s6-fdholderd dies, and one process in a pipeline dies, then the whole pipeline will restart. I think it's an acceptable price to pay, and it's the best we can do without involving process 1. -- Laurent