Re: s6-rc transition failures
Hello Laurent, Thanks for your answers. Other opinions/experience welcome. > That is a fair point. Normally, you should adjust the s6-rc > timeouts (both the global one and the service-specific one) to > make sure s6-rc does *not* time out before the service is ready - > but if there's an unexpected significant delay, the situation can > happen. Just to be clear I am talking about a service going into an infinite loop or deadlock. Obviously a bad service but I want to protect my system against it. > What I can do is add an option to s6-rc to make it explicitly send > a s6-svc -d to a service that times out before reaching readiness: > ensure that a service is either ready in time, or definitely down. > Would that help? Yes that would help. I suppose you also mean to wait for the service to go down before returning ? > The annoying thing is it can't be symmetrical: when a down > transition times out, there's no way I'm going to start the service > again. :) But generally, a down transition timing out signifies a > badly written finish script, or badly calibrated timeouts, and > it can be easily solved by running s6-rc -d change again. I agree. I would add that if timeout-down > timeout-kill + timeout-finish + some margin, the down transition should generally never time out. > What I can do is add a bit of signal handling to s6-rc, so that if > it gets interrupted, say with a SIGINT or SIGTERM, it exits ASAP, > while still ensuring consistency of the service states. I was thinking exactly the same :). I even think this could be tailored to system shutdown (I do not see another use case). E.g. for ongoing longrun up transitions, s6-rc could act as if the transition timed out and send "s6-svc -d". For ongoing longrun down transitions I am not sure whether it should wait for it to complete or not. > Unfortunately, for oneshots it would mean waiting for the current > transitions to finish before exiting - s6-rc has no way to interrupt > a running oneshot, and adding one (making s6rc-oneshot-runner kill > all its children) would not help, because until the oneshot script > exits, it is not visible from the outside whether it has accomplished > its transition or not - so the state would still be undetermined. I tend to think this would not be too much of a problem as I picture oneshots as having timeout-up and timeout-down of a few seconds, as opposed to longruns having timeouts of one or two minutes. But this assumption may be totally wrong. > Also, state consistency cannot be 100% ensured, because s6-rc could > still receive a SIGKILL - but if you kill -9 s6-rc, you deserve > trouble. I won't kill -9 s6-rc, I promise. Kr, Lionel
s6-rc transition failures
Hello all, I am facing questions regarding the way to correctly handle transition failures with s6-rc. The new permanent failure feature already clarifies some scenarios but I still have doubts about some cases. Below are two concrete examples. I would be happy to have remarks or suggestions about how to cope with them clean and nice :). 1. I start a longrun service with "s6-rc -u change svc". This service hangs and never reaches readiness notification. After timeout s6-rc will declare the transition a failure. But the process is actually running and I have no way to stop it through s6-rc. The only way is to issue "s6-svc -d /path/to/svc". But then I have the feeling I am doing something in the back of s6-rc to unblock the situation because s6-rc cannot handle it. Somehow I wish s6-rc had an extra state BUSY between DOWN and UP when the transition is ongoing, and that it could bring me back to DOWN when I ask it to. 2. Slightly related, I have an issue with system shutdown. I am working on a buildroot system and specifically I use the /etc/rc.tini which can be found here [1] and which is executed as part of the shutdown sequence of the system. The problem is with the invocation of "s6-rc -b -da change" (I added the -b). If there is already an s6-rc ongoing, the shutdown sequence will be blocked until the first s6-rc times out. And this kind of timeout is of the order of minutes as I have slow services depending on each other. I currently think the best thing to do is to is to "killall s6-rc" before calling "s6-rc -ad change". This leaves a little race condition possible, but more importantly, I have concerns about killing an ongoing s6-rc. This will leave longrun services in the middle of a state transition - there is the connection with the first scenario - and I expect the final effect is that the finish script will not be executed before the system goes down, which is precisely what I want to happen when I call "s6-rc -ad change". Secondly, I do not know what effect this will have on oneshots. I fear "/etc/init.d/S98xxx start" will still be running and "/etc/init.d/S98xxx stop" will be executed - the thought of which horrifies me beyond reasoning. [1] https://github.com/elebihan/s6-br2-init-skeleton/blob/master/data/skeleton/etc/rc.tini Thanks in advance for any idea that might help me :). Kr, Lionel
Re: [announce] skarnet.org Spring 2017 release: s6 init fails with buildroot on MIPS
Hello all, After upgrading to the below release of skarnet packages, my system is not booting anymore on MIPS. I am using buildroot and the s6-linux-init-skeleton provided there by Éric Le Bihan. The init usually stays blocked there: s6-rc: info: processing service fdholder: starting with the following processes running: 136 root s6-rc -v 3 -u change services-all 143 root s6-fdholderd -1 -i data/rules 146 root s6-ipcserverd -1 -- s6-ipcserver-access -v0 -E -l0 -i data/rules -- s6-sudod -t 2000 -- /usr/libexec/s6-rc-oneshot-run -l 173 root s6-svlisten1 -u -- /run/s6-rc/scandir/klogd-log s6-svc -u -- /run/s6-rc/scandir/klogd-log 183 root s6-ftrigrd 296 root s6-svlisten1 -U -- /run/s6-rc/scandir/fdholder s6-svc -u -- /run/s6-rc/scandir/fdholder 297 root s6-ftrigrd and I regularly see the following message, even though not consistently: s6-ftrigrd: fatal: unable to flush asyncout: Broken pipe When I build the very same system for x86 I do not see any issue, it is very stable, so I suspect a cross-compilation issue. Since buildroot patches skalibs in order to determine system capabilities through build-time asserts instead of runtime tests, it is possible that something goes wrong there. However I have adapted the patches to the new release, and so far I cannot identify any error in the detected system settings. If anyone has any idea to help the investigation, it would be highly appreciated :). Maybe for example a hint on what could cause s6-ftrigrd to get a broken pipe ? Kr, Lionel From: supervision@list.skarnet.orgon behalf of Jean Louis Sent: Tuesday, March 28, 2017 3:45 PM To: Laurent Bercot Cc: skaw...@list.skarnet.org; supervision@list.skarnet.org Subject: Re: [announce] skarnet.org Spring 2017 release By the way, I have upgraded, without control, I see it is working well and fine on a reboot. On Tue, Mar 28, 2017 at 12:27:28PM +, Laurent Bercot wrote: > > * s6-2.5.0.0 > > -- > And obviously I forgot to mention the important change for users: > s6-svstat can now print programmatically parsable output, via a new > "-o field" option (and shortcuts for common fields). This feature was > asked for a long time ago. > > -- > Laurent >
Re: Customise shutdown signal at the s6-rc level?
> Doesn't > >svc -wD -T1000 servicedir || svc -k servicedir > > do what you want for the "hangup problem" ? Problem is that you can't trigger that from s6-rc. Lionel
Re: Customise shutdown signal at the s6-rc level?
What I would like to have is "s6-rc -d change foo" sending SIGTERM and then SIGKILL if the service is not down after x seconds. Currently If a daemon hangs it has annoying side effects. If I don't put a timeout on the s6-rc command the state machine is blocked, and I cannot shut down the system anymore. If I do put a timeout, s6-rc considers the service is down while it is not. It is then impossible to stop or restart it. When timeout occurs, maybe s6-rc should send SIGKILL to make sure the service is down ? Lionel From: supervision@list.skarnet.orgon behalf of Laurent Bercot Sent: Tuesday, May 2, 2017 10:51:19 AM To: supervision@list.skarnet.org Subject: Re: Customise shutdown signal at the s6-rc level? >[1] Damn ezmlm-cgi bug, this time it triggered without an accented character. Sorry about that :( Here's the URL to the full message: https://www.mail-archive.com/supervision@list.skarnet.org/msg01427.html About customizing shutdown signals in s6-rc: how do you suggest it should be done? There are two ways I can see it work: 1. by including a call to the "trap" binary in the generated run script. 2. by including the "what signal should I send" information into the generated service database and making "s6-rc -d change foo" use that information. Solution 1 means adding magic that changes the process tree, and I'm very reluctant to do that. s6-rc-compile already performs magic with run scripts (to move around fds for pipelining and notification), but it doesn't change the final process tree: the long-lived process *is* the user's run script. Adding a call to trap would break that expectation, and I don't think it's worth it: users who need it can perform the trapping in their run script themselves. (It's trivial to do in shell.) Solution 2 means changing the s6-rc database ABI by adding a field (which is doable at the cost of a major version bump and recompilation of every user database), but more importantly, it means that s6-svc -d is not automatically valid anymore, and more investigation (looking into the s6-rc-database to find the correct signal...) is necessary for a user to know the right way to temporarily stop a service. It's an additional know-how burden I don't want to put on users, most of whom are already having a hard enough time with s6 as is. Neither of those solutions is appealing to me. Currently, if someone needs to use a different shutdown signal, I would recommend them to manually perform a trap in their run script, be it with s6 or with s6-rc. If I were to work on a more official, better integrated solution, I would do it at the s6-supervise level. I would not implement custom control scripts, for the reasons indicated in the above link, but it would probably be possible to implement a safer solution, such as reading a file containing the name of the signal to send when s6-svc -d is called. Is there a real, important demand for this? I'd rather not do it and fix daemons that don't use SIGTERM to shutdown instead... -- Laurent