On 16/07/2015 19:22, Colin Booth wrote:
You're right, ./run is up, and being in ./finish doesn't count as up. At work we use a lot of runit and have a lot more services that do cleanup in their ./finish scripts so I'm more used to the runit handling of down statuses (up for ./run, finish for ./finish, and down for not running). My personal setup, which is pretty much all on s6 (though migrated from runit), only has informational logging in the ./finish scripts so it's rare for my services to ever be in that interim state for long enough for anything to notice.
I did some analysis back in the day, and my conclusion was that admins really wanted to know whether their service was up as opposed to... not up; and the finish script is clearly "not up". I did not foresee a situation like a service manager, where you would need to wait for a "really down" event.
As for notification, maybe 'd' for when ./run dies, and 'D' for when ./finish ends. Though since s6-supervise SIGKILLs long-running ./finish scripts, it encourages people to do their cleanup elsewhere and as such removes the main reason why you'd want to be notified on when your service is really down. If the s6-supervise timer wasn't there, I'd definitely suggest sending some message when ./finish went away.
Yes, I've gotten some flak for the decision to put a hard time limit on ./finish execution, and I'm not 100% convinced it's the right decision - but I'm almost 100% convinced it's less wrong than just allowing ./finish to block forever. ./finish is a destroyer, just like close() or free(). It is nigh impossible to define sensical semantics that allow a destroyer to fail, because if it does, then what do you do ? void free() is the right prototype; int close() is a historical mistake. Same with ./finish ; and nobody tests ./finish's exit code and that's okay, but since ./finish is a user-provided script, it has many more failure modes than just exiting nonzero - in particular, it can hang (or simply run for ages). The problem is that while it's alive, the service is still down, and that's not what the admin wants. Long-running ./finish scripts are almost always a mistake. And that's why s6-supervise kills ./finish scripts so brutally. I think the only satisfactory answer would be to leave it to the user : keep killing ./finish scripts on a short timer by default, but have a configuration option to change the timer or remove it entirely. And with such an option, a "burial notification" when ./finish ends becomes a possibility.
Ah, gotcha. I was sending explicit timeout values in my s6-rc comands, not using timeout-up and timeout-down files. Assuming -tN is the global value, then passing that along definitely makes sense, if nothing else than to bring its behavior in-line with the behavior of timeout-up and timeout-down.
Those pesky little s6-svlisten1 processes will get nerfed.
Part of my job entails dealing with development servers where automatic deploys happen pretty frequently but service definitions dont change too often. So having non-privileged access to a subsection of the supervision tree is more important than having non-privileged access to the pre- and post- compiled offline stuff.
I understand. I guess I can make s6-rc-init and s6-rc 0755 while keeping them in /sbin, where Joe User isn't supposed to find them.
By the way, that's less secure than running a full non-privileged subtree.
Oh, absolutely. It's just that a full setuidgid subtree isn't very common - but for your use case, a full user service database makes perfect sense. -- Laurent