[announce] skarnet.org June 2024 update

2024-06-07 Thread Laurent Bercot



 Hello,

 New versions of some skarnet.org packages are available. This is a
light update, focused on quality of life, improved support
for old platforms, and in preparation for larger updates later. The
exception is a breaking change to s6, which adds support for addressing
a service's process group - useful e.g. if you want to kill straggler
processes when the service dies and you're not using cgroups.

skalibs-2.14.2.0  (minor)
execline-2.9.6.0  (minor)
s6-2.13.0.0   (major)
s6-rc-0.5.4.3 (release)
s6-dns-2.3.7.2(release)
s6-networking-2.7.0.3 (release)
mdevd-0.1.6.4 (release)
smtpd-starttls-proxy-0.0.1.4  (release)
tipidee-0.0.5.0   (minor)
shibari-0.0.1.1   (release)

 Details below. If there are no details for a package, the release
only contains bugfixes and/or code cleanups.


 * skalibs-2.14.2.0
   

 This version underlies all the changes in the other packages. It brings
support for the midipix environment and old MacOS versions; it adds
more cspawn functions for improved posix_spawn() use.

 https://skarnet.org/software/skalibs/
 git://git.skarnet.org/skalibs


 * execline-2.9.6.0
   

 This version adds "elglob -d" for encoding the result of a glob into
a single word. It also adds "importas -S" to import a variable with
the same name as the substitution value, just like the old "import"
command.

 https://skarnet.org/software/execline/
 git://git.skarnet.org/execline


 * s6-2.13.0.0
   ---

 s6-supervise now tracks the service's process group! It is passed to
the finish script as the 4th argument, so finish scripts can kill the
whole process group if the service leaves unwanted children behind.
Note that this isn't completely foolproof: a very badly behaved service
can create children in different process groups. But daemons don't
normally mess with process groups; and any alternative is non-portable.
 While the service is running, s6-svstat -o pgid prints the process
group id, and s6-svc -K can send a SIGKILL to the process group (as
well as -P for SIGSTOP and -C for SIGCONT).

 Additionally, a small bug has been fixed and an arbitrary limit has
been lifted in s6-ftrigrd: a service can now wait on as many fifodirs
as it wants.

 Your supervision trees need to be restarted after you upgrade to the
new version.

 https://skarnet.org/software/s6/
 git://git.skarnet.org/s6


 * s6-rc-0.5.4.3
   -

 An arbitrary limit has been lifted: the internal fd-holder can now
be autorefilled with as many pipes as it can hold.

 https://skarnet.org/software/s6-rc/
 git://git.skarnet.org/s6-rc


 * s6-dns-2.3.7.2
   --

 Refactor of some APIs to allow for dns-0x20 implementation, which is
useful for interoperating with recent versions of Unbound and
similarly paranoid software.

 https://skarnet.org/software/s6-dns/
 git://git.skarnet.org/s6-dns


 * tipidee-0.0.5.0
   ---

 tipidee.conf now accepts "" as an extension indicator for the
content-type directive, which allows the user to specify a default
Content-Type for files without an extension (previously, the
application/octet-stream default was not overridable).

 https://skarnet.org/software/tipidee/
 git://git.skarnet.org/tipidee


 * shibari-0.0.1.1
   ---

 shibari now implements dns-0x20, which means the latest Unbound
resolvers won't fail when querying it.

 https://skarnet.org/software/shibari/
 git://git.skarnet.org/shibari


 Enjoy,
 Bug-reports welcome.

--
 Laurent



Re: s6 daemon restart feature enhancement suggestion

2024-05-26 Thread Laurent Bercot
Let me say that a $daemon i.e. wpa_supplicant or iwd providing 
$service=WiFi{wpa} has been pulled into the s6-rc compiled db and 
started in the supervision tree.
But the system doesn't have the hardware to support that, or some 
important resource is unavailable.


 So, here's my question: if the system doesn't have the hardware to
support that, why is the daemon in the database in the first place?

 s6-rc, in its current incarnation, is very static when it comes to its
service database; this is by design. The point is that when you have a
compiled service database, you know what's in there, you know what it
does, and you know what services will be running when you boot your
system.
 Adding dynamism goes against that design. I understand the value of
flexibility (this is why most distributions won't use s6-rc as is: they
need more flexibility in their service manager) but there's a trade-off
with reliability, and s6-rc weighs heavily on the reliability side.

 If you are building a distribution aimed at supporting several kinds
of hardware, I suggest adding flexibility at the *source database*
level, and building the compiled database at system configuration time
(or, in extreme cases, at boot time, though I do not recommend that if
you can avoid it, since you lose the static bootability guarantee).

 If your machine can't run wpa_supplicant, then the service manager
should not attempt to run wpa_supplicant in the first place, so the
wpa_supplicant service should not appear in the top bundle.

 Lacking resources is a different issue: it's a temporary error, and
it makes sense for the service to fail (and be restarted) if it cannot
reserve the resources it needs. If you want to report permanent
failure, and stop trying to bring the service up, after a certain amount
of time, you can write a timeout-up file, or have a finish script exit
125, see below.


A mechanism should be prepared, to let $daemon inform it's instance of 
s6-supervise that it can't run, or can't provide $service / it's 
services.


 If you have the information before the machine boots, you should use
the information to prune your service database, and compile a database
that you know will work with your system.

 If you don't have the information before the machine boots, then a
service failing to start is a normal temporary failure, and s6 will
attempt to restart the service until it reports permanent failure.

 You have several ways of marking a service as permanently failed:

 - (only with s6-rc) you can have a timeout-up file, see
 https://skarnet.org/software/s6-rc/s6-rc-compile.html and look for
"timeout-up"

 - (generic s6) you can have a finish script that uses data that has
been collected by s6-supervise to determine whether a permanent failure
should be reported or not. A finish script can report permanent failure
by exiting 125.
 For instance, using s6-permafailon, see
 https://skarnet.org/software/s6/s6-permafailon.html , allows you to
tell s6 that if the service exits nonzero too many times in a given
number of seconds, then it's hopeless.

 Does this help?

--
 Laurent



Re: Building lh-bootstrap-master

2024-04-29 Thread Laurent Bercot

Should the lh-bootstrap README be changed so that:


 Yes, I updated the README.md, good point.



Re: Building lh-bootstrap-master

2024-04-26 Thread Laurent Bercot

Indeed, my version of make doesn't have the --jobserver-style option. If I 
remove the reference to --jobserver-style option from ./make the package builds 
without any obvious errors but doesn't appear to be a complete build.


 Yeah, for some reason parallel builds weren't fully working before
this option (I knew more when I debugged this but I've since
forgotten). I don't think I realized the option was so recent.
As Alexis says, your best choice is to use GNU Make 4.4, but if
you cannot, the workaround is not to use any -j option to make,
which should eventually, very slowly, get you a complete build.

--
 Laurent



Re: Sample Configuration

2024-04-07 Thread Laurent Bercot

It is a very elegant implementation of s6


 Vocabulary nitpick: no, 66 is not an "implementation of s6".

 s6 is not a specification. It is a software package. So, there is no 
other

"implementation of s6" than the s6 software package itself.

 66 is a service manager than runs on top of the s6 process supervision
suite. It is an implementation of the "service manager" concept. It uses
the s6 interface.

 Just making sure we're using the correct terms. It helps prevent
misunderstandings. :)

--
 Laurent



Re: Sample Configuration

2024-04-07 Thread Laurent Bercot

was probably going to use a simple runit configuration because the s6 universe 
seemed to complex for me to figure out in a reasonable amount of time. I haven't


 A basic s6 system is barely more complex than a runit system. This page
should help you start:
 https://skarnet.org/software/s6-linux-init/quickstart.html

 HTH,

--
 Laurent



Re: Sample Configuration

2024-04-07 Thread Laurent Bercot

I'm building a Linux system based on musl and Busybox. I'm considering using 
s6/s6-rc as an init/supervision system. I see there are good docs on all the 
skanet.org programs, but I can't find an example of a working configuration 
that puts them together. Is there an example available somewhere?


 In addition to everything that has been said, you could check out
lh-bootstrap. https://github.com/skarnet/lh-bootstrap

 It's a tool I use to build a complete VM using musl, busybox and 
skarnet.org

utilities from scratch, so I can test it under qemu. The layout of the
filesystem there should give you an idea of how I intend the whole thing
to be used. ;)

--
 Laurent



Re: s6-rc user services on Gentoo

2024-04-06 Thread Laurent Bercot

I have been using `--print-pid=3` as readiness notification for dbus-daemon
for quite a while now on my user services and I haven't had any problems
with it so far. IIRC, looking at dbus-daemon code, it actually prints the
socket address first then its pid. So, I use the `--print-address=` option
to save the socket address to a file for consumption by s6-envdir.


 Interesting. But it's also an implementation detail, because 
dbus-daemon
could print its pid at any time, no matter its readiness state. That's 
why
I chose to go with --print-address - but of course reading the code 
instead

of speculating is the right thing to do :)

--
 Laurent



Re: s6-rc user services on Gentoo

2024-04-06 Thread Laurent Bercot




But then, there is a problem if one actually wants the server address
information that --print-address provides. Alexis' 'run' script for
example wants to save that to a file (apparently in a directory
suitable for s6-envdir). If the output is sent to the notification
pipe instead, s6-supervise will 'eat' it while waiting for the final
newline character, and then the information is lost.


 That is true. The option is only usable as readiness notification if 
you

don't otherwise need the information.



And more generally, there's also the question about how 'ready'
dbus-daemon actually is when the point in its code that prints the
server address is reached. I can't really say without looking at the
code; dbus-daemon has many duties once its UNIX domain socket is
bound.


 That doesn't matter. The important thing with readiness is that clients
need to be able to send messages to the daemon. As soon as the socket is
bound, it's okay; if the daemon hasn't finished initializing yet, then
that's what kernel socket buffers are for.

 Of course, if the daemon still needs to allocate resources and can 
*fail*
after binding the socket and printing the line, that's not good, but 
well,

it's a balance between reliability and simplicity.

--
 Laurent



Re: s6-rc user services on Gentoo

2024-04-03 Thread Laurent Bercot




2) The presence of a notification-fd file tells s6 that dbus-daemon
can be somehow coerced into producing an s6-style readiness
notification using file descriptor 3 without changing its code, are
you sure that's the case with this script? My service definition for
the system-wide message bus polls for readiness using s6-notifyoncheck
and a dbus-send command...


 "dbus-daemon --print-address=3" should produce a suitable notification.
The address of the bus can only be printed once the bus exists. :)

--
 Laurent



Re: s6-rc: dependencies across scandirs

2024-01-03 Thread Laurent Bercot

Hi, I would like to have some kind of dependency handling across
multiple scandirs.

For example an instanced service that starts and waits for a dependency
before the first instace has been started.


 It's generally a bad idea to add, in a service's run script itself,
conditions for your service to work, or to restart. It tends to make
your service *less* reliable, instead of *more*.
 Supervision is about keeping the services up, not monitoring when they
should or should not run. The latter is for service management.

 Instead, design your services so they exit if they have unmet hard
dependencies. The supervisor will try to bring them up again and again,
and it will eventually work, when the dependencies are resolved. The
job of avoiding too much looping, again, belongs to a service manager.



A more complicated variation I would like to have is a user service
depending on a system service (and starting it!). For example a user
service for sway depends on the system service for seatd.


 Start your sway. If seatd isn't up, sway will exit, and the supervisor
will restart it. This works as intended.

 Now, start propagation is a different question, and a more difficult
one, because it crosses privilege boundaries. Do you want users to be
able to start system services? To stop them? In the general case, no,
you certainly don't.

 In this particular case, you probably want seatd to be controllable
by the user at the console. That's what s6-svperms is for:
https://skarnet.org/software/s6/s6-svperms.html

 When relevant (at user login time? earlier?) make seatd controllable
by a group only the console user is in. Then you can simply add a
s6-svc -u for seatd in your sway run script, if that's what you need.

--
 Laurent



Re: OpenVPN runit service flooding tty

2023-12-03 Thread Laurent Bercot




Hi, I am using the Artix Linux runit service for OpenVPN, available here:
https://gitea.artixlinux.org/packages/openvpn-runit


 Sounds like two compounding problems:
 1. the default logging place for the runit supervision tree is the same
terminal you're using to log in
 2. there is no dedicated logger for the openvpn service.

 IIRC, 1. is intrinsic to runit, there is no way to redirect the default
logging, and that means your console is at the mercy of verbose services 
-

so every service needs a dedicated logger.
 You can use --syslog as a workaround indeed, but 2. should be fixed: a
runit service should use the runit logging facility. That is something
you should report and suggest to the maintainer of the openvpn service
in Artix.

--
 Laurent



[announce] s6-2.12.0.2

2023-11-20 Thread Laurent Bercot



 Hello,

 I don't normally spam all of you for bugfix releases, but this one is
important. You definitely want to grab the 2.12.0.2 version of s6, not
the 2.12.0.1 one. The bug could prevent a shutdown from completing.

 https://skarnet.org/software/s6/
 git://git.skarnet.org/s6

 Sorry about that,

--
 Laurent



[announce] but what about *second* skarnet.org November 2023 release?

2023-11-19 Thread Laurent Bercot



 Hello,

 New versions of some skarnet.org packages are available.
 This is mostly a bugfix release, addressing the problems that were
reported since the big release two weeks ago.

 Despite that, s6-dns got a minor version bump because the fixes
needed an additional interface; and s6-networking got a major bump,
because it needed an interface change. Nothing that *should* impact
you, the changes are pretty innocuous; but see below.

skalibs-2.14.0.1(release)
s6-2.12.0.1(release)
s6-dns-2.3.7.0(minor)
s6-networking-2.7.0.0(major)
tipidee-0.0.2.0(minor)


 * skalibs-2.14.0.1
   

 This release is important if you want the fixes in s6-dns: the
ipv6 parsing code has been revamped.

 https://skarnet.org/software/skalibs/
 git://git.skarnet.org/skalibs


 * s6-2.12.0.1
   ---

 It's only a bugfix, but you want to grab this version, because the
bug was impactful (s6-svscanctl -an not working as intended).

 https://skarnet.org/software/s6/
 git://git.skarnet.org/s6


 * s6-dns-2.3.7.0
   --

 - The parsing of /etc/hosts now ignores link-local addresses instead
of refusing to process the whole file.
 - New interface to only process /etc/hosts if a client requires it.

 https://skarnet.org/software/s6-dns/
 git://git.skarnet.org/s6-dns


 * s6-networking-2.7.0.0
   -

 - s6-tlsc-io has changed interfaces; now it's directly usable from a
terminal. This change should be invisible unless you were using
s6-tlsc-io without going through s6-tlsc (which, until now, there was
no reason to do).
 - s6-tcpserverd now logs "accept" and "reject" instead of "allow" and
"deny", this terminology now being reserved to s6-tcpserver-access.
 - The -h option to s6-tcpclient and s6-tcpserver-access has changed
semantics. Previously it was used to require a DNS lookup, and was 
hardly

ever specified since it was the default (with -H disabling DNS lookups).
Now it means that DNS lookups must be preceded by a lookup in the
hosts database.
 - A new pair of options, -J|-j, are accepted by s6-tlsc-io and
s6-tlsd-io, and by extension the whole TLS chain of tools. -J means that
s6-tls[cd]-io should exit nonzero with an error message if the peer 
fails

to send a close_notify before closing the connection; -j, which is the
default, means ignore it and exit normally.
 - The TLS tunnels work as intended in more corner cases and
pathological situations.

 https://skarnet.org/software/s6-networking/
 git://git.skarnet.org/s6-networking


 *  tipidee-0.0.2.0
---

 - Bugfixes.
 - New configuration options: "log x-forwarded-for", to log the contents
of the X-Forwarded-For header, if any, along with the request; and
"global executable_means_cgi", to treat any executable file as a CGI
script (which is useful when you control the document hierarchy, but
dangerous when it's left to third-party content manager programs).

 https://skarnet.org/software/tipidee/
 git://git.skarnet.org/tipidee


 Enjoy,
 As always, bug-reports welcome.

--
 Laurent



Re: restarting s6-svscan (as pid 1)

2023-11-18 Thread Laurent Bercot



> I believe (have not yet tested) that I can relatively simply create 
the

maintenance system on the fly by copying a subset of the root fs into a
ramdisk, so it doesn't take any space until it's needed.


 The problem with that approach is that your maintenance system now
depends on your production system: after a rootfs change, you don't
have the guarantee that your maintenance system will be identical to
the previous one. Granted, the subset you want in your maintenance fs
is likely to be reasonably stable, but you never know; imagine your
system is linked against glibc, you copy libc.so.6, but one day one
of the coreutils grows a dependency to libpthread.so, and voilà, your
maintenance fs doesn't work anymore.

 You probably think the risk is small, and you're probably correct.
I just have a preference for simple solutions, and I believe that a
small, stable, statically-linked maintenance squashfs would be worth
the space it takes on your drive. :)

--
 Laurent



Re: restarting s6-svscan (as pid 1)

2023-11-17 Thread Laurent Bercot

This may be a weird question, maybe: is there any way to persuade
s6-svscan (as pid 1) to restart _without_ doing a full hardware reboot?
The use case I have in mind is: starting from a regular running system,
I want to create a small "recovery" system in a ramdisk and switch to it
with pivot_root so that the real root filesystem can be unmounted and
manipulated. (This is instead of "just" doing a full reboot into an
initramfs: the device has limited storage space and keeping a couple
of MB around permanently just for "maintenance mode" doesn't seem like a
great use of it)


 As Adam said, this is exactly what .s6-svscan/finish is for. If your
setup somehow requires s6-svscan to exec into something else before
you shut the machine down, .s6-svscan/finish is the hook you have to
make it work.
 Don't let the naming stop you. The way to get strong supervision with
BSD init is to list your services in /etc/ttys. You can't get any
cheesier than that.

 That said, I'm not sure your goal is as valuable as you think it is.
If you have a running system, by definition, it's running. It has 
booted,

and you have access to its rootfs and all the tools on it; there is
nothing to gain by doing something fragile such as exec'ing into
another pid 1 and pivot_rooting. Unless I've missed something, the
amount of space you'll need for your maintenance system will be the
exact same whether you switch to it from your production system or from
cold booting.

--
 Laurent



Re: s6-svunlink hangs with s6-2.12.0.0

2023-11-13 Thread Laurent Bercot

calling s6-svunlink for a s6-svlink'ed service hangs with s6-2.12.0.0
and succeeds with previous s6-2.11.3.2


 It's very probably caused by a known issue with s6-svscan. Could you
please try with the latest s6 git and tell me whether it's fixed there?

--
 Laurent



[announce] skarnet.org November 2023 release

2023-11-06 Thread Laurent Bercot



 Hello,

 New versions of all the skarnet.org packages are available.
 This is a big one, fixing a lot of small bugs, optimizing a lot behind
the scenes, adding some functionality. Some major version bumps were
necessary, which means compatibility with previous versions is not
guaranteed; updating the whole stack is strongly recommended.

 Also, tipidee is out! If you've been looking for a small inetd-like
Web server that is still standards-compliant and fast, you should
definitely check it out.

skalibs-2.14.0.0 (major)
nsss-0.2.0.4 (release)
utmps-0.1.2.2(release)
execline-2.9.4.0 (minor)
s6-2.12.0.0  (major)
s6-rc-0.5.4.2(release)
s6-linux-init-1.1.2.0(minor)
s6-portable-utils-2.3.0.3(release)
s6-linux-utils-2.6.2.0   (minor)
s6-dns-2.3.6.0   (minor)
s6-networking-2.6.0.0(major)
mdevd-0.1.6.3(release)
smtpd-starttls-proxy-0.0.1.3 (release)
bcnm-0.0.1.7 (release)
dnsfunnel-0.0.1.6(release)
tipidee-0.1.0.0  (new!)


 * skalibs-2.14.0.0
   

 This version of skalibs adds a lot of new sysdeps, a lot of new
functions, and changes to existing functions, in order to support
the new features in other packages.
 The most important change is the new cspawn() function, providing
an interface to posix_spawn() with support for most of its options
with a fork() fallback for systems that do not have it.
 What this means is that on systems supporting posix_spawn(), the
number of calls to fork() in the whole skarnet.org stack has been
significantly reduced. This is important for programs where spawning
a new process is in a hot path - typically s6-tcpserver.

 Updating skalibs is a prerequisite for updating any other part of
the skarnet.org stack.
 Once you've updated skalibs, you probably don't *have to* update
the rest; old versions of packages should generally build with the new
skalibs as is, and if indeed they do, nothing should break. But it is
a major update, so there are no guarantees; please update to the
latest versions at your convenience.

 https://skarnet.org/software/skalibs/
 git://git.skarnet.org/skalibs


 * execline-2.9.4.0
   

 - execlineb now has a dummy -e option (it does nothing). This is so
it can be used as a replacement for a shell in more environments.
Also, execline programs use fork() a lot less, so overall execline
script performance is better.
 - The multicall setup did not properly install symbolic links for
execline programs; this is fixed, and is fixed as well as in other
packages supporting a multicall setup (s6-portable-utils and
s6-linux-utils).

 https://skarnet.org/software/execline/
 git://git.skarnet.org/execline


 * s6-2.12.0.0
   ---

 - s6 programs use fork() less.
 - New -s option to s6-svc, to send a signal by name or number.
 - s6-svscan has been entirely rewritten, in order to handle logged
services in a more logical, less ad-hoc way. It should also be more
performant when running as init for a system with lots of s6-supervise
processes (improved reaping routine).
 - The obsolete (and clunky) s6lockd subsystem has been deleted.
s6-setlock now implements timed locking in a much simpler way.

 https://skarnet.org/software/s6/
 git://git.skarnet.org/s6


 * s6-linux-init-1.1.2.0
   -

 - New -v option to s6-linux-init-maker, setting the boot verbosity.
 - Several small bugfixes, one of them being crucial: now your
systems shut down one second faster!

 https://skarnet.org/software/s6-linux-init/
 git://git.skarnet.org/s6-linux-init


 * s6-linux-utils-2.6.2.0
   --

 - Support for the minflt and majflt fields in s6-ps.

 https://skarnet.org/software/s6-linux-utils/
 git://git.skarnet.org/s6-linux-utils


 * s6-dns-2.3.6.0
   --

 - Support for on-demand /etc/hosts data in s6-dnsip and s6-dnsname.
It is achieved by first processing /etc/hosts into a cdb, then looking
up data in the cdb. You can, if you so choose, perform this processing
in advance via a new binary: s6-dns-hosts-compile.

 https://skarnet.org/software/s6-dns/
 git://git.skarnet.org/s6-dns


 * s6-networking-2.6.0.0
   -

 This is the package that has undergone the biggest changes.

 - No more s6-tcpserver{4,6}[d]. IPv4 and IPv6 are now handled by the
same program, s6-tcpserver, which chainloads into a unique long-lived
one, s6-tcpserverd.
 - s6-tcpserver now exports TCPLOCALIP and TCPLOCALPORT without the
need to invoke s6-tcpserver-access.
 - s6-tcpserver-access does not hardcode a warning when it is
invoked without a ruleset. It can now just be used for additional data
gathering (such as TCPREMOTEHOST) without jumping through hoops.
 - s6-tcpserverd has been thoroughly optimized for performance. It will
handle as heavy a load as the underlying system will allow.
 - Yes, this means you can now use s6-tcpserver to serve 

Re: s6-rc-update does not create pipes when already-running service becomes a consumer

2023-09-26 Thread Laurent Bercot



 I agree with all you're saying here...
 ... except that this case makes no sense in the first place. A consumer
and a non-consumer are fundamentally different things.

 A running service that is not a consumer does not read anything on its
stdin. It runs autonomously.
 A consumer, by essence, is waiting on stdin input, in order to process
it and do something with it.

 If a non-consumer were to become a consumer, it would mean its very
nature would be transformed. It would not behave in the same way at all.
And so, it makes no sense for it to be the "same" service. It should not
be treated the same way; it should not keep the same name.

 So, if anything, the change I would make is, when a non-consumer
becomes a consumer, s6-rc-update just throws an error and exits.

 The situation is different with a non-producer becoming a producer,
because a non-producer can already naturally write to its stdout, and
its output simply falls through to the catch-all logger. When becoming
a producer, it just needs to write into a pipe instead, and the pipe
already exists since it's created on the consumer side, which already
exists, so indeed it's all a matter of restarting the service.

--
 Laurent



Re: timestamp or timing info for s6-rc services?

2023-05-10 Thread Laurent Bercot

Hi all,
I am running s6 on buildroot with a bunch of custom services (and bundles) with 
dependencies.  Was there a way to turn on timestamps for s6-rc logging?  I am 
currently logging with -v 2 option set.  Or is there any timing information 
saved between a service starting and successfully started (or stopped) that I 
get at that is already saved somewhere?


 I don't know exactly what customization options Buildroot gives you,
but since the output from oneshots and from longruns without a dedicated
logger goes to the catch-all logger, you would achieve what you want by
making sure your catch-all logger has timestamps activated.

 If the setup is using s6-linux-init, the -t[1|2|3] option to
s6-linux-init-maker will do that; it should be an option in the 
Buildroot

configuration.

--
 Laurent



Re: Systemd unit parser and s6 generator

2023-04-24 Thread Laurent Bercot

what I can do, if it is of
interest to you, is list all the directives in a service file and
rate their conversion difficulty, so you can then evaluate your own
service files and assess the feasability of an automated conversion tool.


 It took way longer than expected, and was a harrowing task, but it's
finally done - and I'm glad I did it because it was a fascinating dive
into the systemd worldview, and I'm all the more knowledgeable for it.
(And a little dumber, too.)

 The results can be found here:
 https://skarnet.org/software/s6/unit-conversion.html

 I could probably have written a comment - and, sometimes, even an
informative and non-snarky one! - for every single directive I listed,
but I had to restrain myself in order to finish the document before the
end of the year.

 Enjoy.

--
 Laurent



Re: [svlogd] / -ttt / why UTC?

2023-04-06 Thread Laurent Bercot

I don't know exactly what u mean with "TAI-10".
I guess u are refering to those 10seconds


 Yes. You cannot set your system clock to TAI, unless you want wildly
incorrect results from time() and similar system calls. Setting it 10
seconds earlier than TAI is the best you can do; and that's what the
right/ timezones expect.



here i disagree a little bit:
As long as the software uses glibc's time functions to break down
"seconds since the start of 1970* to year, month, mday, hours, minutes, seconds,
the software does not need any patches...
Ur link is a little bit misleading or outdated in that point...


 Have you tried it?

 localtime() will work if your timezone is correctly set, yes.
 gmtime() *will not work*, because it assumes a UTC system clock.
 Programs making their own assumptions, and there are a lot of these,
will not work. Don't think everyone uses the libc; new compiled
language ecosystems such as for instance Go or Rust have their own time
libraries, and can you be certain they don't assume UTC?

 "As long as the software pays strict attention to the standards and
follows them to a T even for non-100%-cookie-cutter cases" is a very
strong hypothesis to make, even when you perform minor changes.
Changing the way the system clock counts time is NOT a minor change.



What would be wrong about the "-" (localtime time stamp) option?
Then I would not have to build/write my own *log daemon...
And svlogd just needs to use localtime_r(3)...


 In theory, nothing, except that it's a bad idea to timestamp logs
with local time (hello, we're in the 21st century, we manage computers
worldwide, we want logs to be shareable and mergeable across timezones).

 In practice, you're asking a maintainer to perform work that *you* 
need,

and that very few other people need. That's generally considered bad
form. The way to go about it would be to implement the functionality
yourself, and submit a patch; that said, since the last time we saw
svlogd's maintainer was four years ago in a flash, I probably wouldn't
bother if I were you.
 By recommending s6-log, I gave you the solution that requires from you
the least amount of work and the least amount of waiting.

--
 Laurent



Re: [svlogd] / -ttt / why UTC?

2023-04-06 Thread Laurent Bercot

My boxes use TAI (international atomic time) in order to have SI-seconds and 
60sec minutes and 24hrs days...


 If your system clock is set to TAI-10, then *all* the time-handling
software on your machine must be aware of it, in order to perform
time computations accurately. It is not sufficient to use the right/
time zones: a TAI-10 setup changes the interpretation of what
system calls such as time() and clock_gettime() return. It is only
possible to do this and have a consistent view of time if you control
any and every piece of software on your machine that needs to read
and display time.

 See https://skarnet.org/software/skalibs/flags.html#clockistai
for a description of the difference between "system clock set to TAI-10"
and "system clock set to UTC".

 If you're using a distribution - in your case, Void Linux - then
TAI-10 is not an option for you, unless the distribution explicitly
supports it and has built all its binaries to support it. Even if you
could make svlogd work with it, you would have trouble with other
software - this may be benign, like "date", or less so, like software
that checks the validity of X.509 certificates against certain things
like expiration dates.

 Believe me, I understand your plight, and I wish TAI-10 were the de
facto standard instead of UTC. I wish it were at least more widely
known and supported. I use it on my servers - but it comes at a heavy
price: not letting any third-party software, 99.9% of which is hardcoded
for UTC, interact with time, and only using my own tools, which support
the setup, to do that.

 I don't think you'll be able to keep running Void with a full TAI-10
setup and have everything work perfectly, at some point you'll have to
make a real choice. But right now, to superficially solve your current
issue, you could try running s6-log (from s6) in place of svlogd.
It does the same thing, but it supports TAI-10, if you build skalibs
with the --enable-tai-clock option, as described by the link above.
You probably have to build skalibs and s6 yourself: I don't think the
Void package for skalibs uses that option.

 Good luck,

--
 Laurent



Re: *-man-pages: s6-rc port, Makefile fix

2023-04-04 Thread Laurent Bercot




An mdoc(7) port of the documentation for s6-rc is now available:


 That's awesome, thanks a lot Alexis! 拾

--
 Laurent



[announce] April 2023 bugfix release

2023-04-02 Thread Laurent Bercot



 Hello,

 New versions of some skarnet.org packages are available. They fix a few
visible bugs, so users are encouraged to upgrade.

 I usually do not announce bugfix releases. This e-mail is sent because
two new functionalities were also in git when the bugfixes needed to be
made, so they're now available:

 - A new -D option to execline's elgetopt, allowing customization of the
value for the ELGETOPT_n variable when the -n option is given and does
not expect an argument.

 - A new -R option to s6-linux-init-maker, allowing you to set hard
resource limits for the system at boot time.

skalibs-2.13.1.1   (release)
execline-2.9.3.0   (minor)
s6-2.11.3.2(release)
s6-linux-init-1.1.1.0  (minor)
s6-portable-utils-2.3.0.2  (release)


 Enjoy,
 Bug-reports always welcome.

--
 Laurent



Re: s6-rc how to log oneshot services

2023-03-07 Thread Laurent Bercot

Artix Linux achieves this with

pipeline -dw { s6-log ... } actual-oneshot

Not ideal due to the lack of supervision for s6-log, but it works.


 Yes, it works. The lack of supervision doesn't matter here because
the s6-log is short-lived: it will die when actual-oneshot exits. The
main drawback here is that it's overengineered: chances are that
actual-oneshot is not going to write enough data to trigger a rotation,
so directly appending to a file would be simpler and faster.

 Additionally, the -d option to pipeline isn't necessary: unless
actual-oneshot does very suspicious things, it shouldn't notice it
has a child (which will outlive it by a few microseconds).

--
 Laurent



Re: s6-rc how to log oneshot services

2023-03-07 Thread Laurent Bercot



 The default stderr for oneshot services is the default stderr of the
supervision tree you're running s6-rc with.
 So, if you're using s6-linux-init, stderr for oneshot services goes
directly to the catch-all logger and you don't need the redirection.

 If you want your oneshots to log to a different place, you need to
redirect the oneshot's stderr to where you want. Since oneshots are,
by definition, only run once, it's generally not necessary to have a
logger process for them; you can redirect to a file (but be careful
of concurrently running oneshots if you have several oneshots logging
to the same file!)

 You can send these logs to a logger process as well, but you cannot
use s6-rc's pipeline feature for that; you'll need to declare a fifo
manually, have a longrun service reading from that fifo, and have your
oneshot depend on that longrun and redirect its stderr to the fifo.
It's probably more trouble than it's worth.

--
 Laurent



Re: Systemd unit parser and s6 generator

2023-02-27 Thread Laurent Bercot

We've discussed internally if we change that process and try to write a
systemd unit parser, because all units are there in Ubuntu.

If we could catch 90% of all cases, we need, we would be happy.
If it would take 2 weeks of work, that would be fine.

Did somebody of you try to implement something? What are your thoughts?


 Hi Oli,

 This is a subject that comes up regularly in the #s6 IRC channel. And
I always answer something like this:

 The difficulty of converting systemd services to s6 services is not a
syntax problem. The unit file syntax is mostly fine.

 The difficulty is the difference of world modelization between systemd
and s6. systemd's view is very holistic, every component can depend on
another one or rely on a systemd-only feature, and the unit file syntax
reflects that; while s6's view is more local, every service is its own
independent thing.
 But more importantly, the way systemd maps the system into concepts is
pretty different from the way s6 maps the system into concepts, and 
these

two views are not 1-to-1 compatible. Translating a setup between systemd
and s6 requires intelligence; it is not possible to write an automated
tool that does it accurately and idiomatically without going *deep* into
things.

 In practice, yes, it is possible to write a converter that will take
systemd.service files and turn them into s6 service directories, and
that will work "most" of the time, depending on how your service files
look. Most directives in systemd.service are directly translatable to
s6 features. However, as soon as you start using esoteric systemd
directives, the complexity explodes, and it's all a balancing act on
what you want to support vs. how difficult it is. And only
systemd.service files are automatically convertible; other unit files
such as systemd.socket, systemd.path and systemd.target are all 
dependent

on the systemd-centric system view and it is impossible to treat them
without analyzing the entire system.

 I still owe you a couple hours of work, so what I can do, if it is of
interest to you, is list all the directives in a service file and
rate their conversion difficulty, so you can then evaluate your own
service files and assess the feasability of an automated conversion 
tool.

90% coverage is doable if your services are super simple and don't rely
on systemd-specific features, but it's very easy to get lost in the 
weeds.


--
 Laurent



[announce] skarnet.org February 2023 release

2023-02-17 Thread Laurent Bercot



 Hello,

 New versions of some skarnet.org packages are available. It hasn't been
long since the last release, but lots of small things have happened and
it doesn't make much sense to let them rot in git.

 The main addition is a new multicall configuration for the execline,
s6-portable-utils and s6-linux utils packages. When you give the
--enable-multicall option to configure, a single binary is compiled,
and 'make install' installs this binary and creates symlinks to it.
This is useful to setups that focus on saving disk space.

 Credit for this addition goes to Dominique Martinet, who nerd-sniped me
into actually testing such a configuration; and it turned out the disk
space gains were very impressive for execline (up to 87%!)
I applied the idea to the s6-portable-utils and s6-linux-utils packages,
which are also made of small, very simple, independent programs, to see
whether it was viable in the general case; but as I suspected, the gains
were not as impressive, and making it work required a significant
refactoring effort. Since other skarnet.org packages would have an even
worse gains/effort ratio, the experiment is stopping there. execline is
an outlier, with a 177 kB amd64 static binary being able to replace a
1.3 MB set of binaries; that's much better than I thought it would be,
so it's worth supporting. Enjoy.

 Other changes include mostly bugfixes and quality-of-life improvements.

 The new versions are the following:

skalibs-2.13.1.0  (minor)
nsss-0.2.0.3  (release)
execline-2.9.2.0  (minor)
s6-2.11.3.0   (minor)
s6-rc-0.5.4.0 (minor)
s6-linux-init-1.1.0.0 (major)
s6-portable-utils-2.3.0.0 (major)
s6-linux-utils-2.6.1.0(minor)
s6-networking-2.5.1.3 (release)
mdevd-0.1.6.2 (release)

 Details of some of these package changes follow.

* skalibs-2.13.1.0
  

 - Bugfixes.
 - New function: sals, listing the contents of a directory in a 
stralloc.

Straightforward, but a large-ish piece of code that was used in multiple
places and needed to be factored.

 https://skarnet.org/software/skalibs/
 git://git.skarnet.org/skalibs


* execline-2.9.1.0
  

 - New --enable-multicall configure option. This is the big one for
some distributions, that don't want to spend 1 MB of disk space on
execline binaries. (They already know my position on that.)

 https://skarnet.org/software/execline/
 git://git.skarnet.org/execline


* s6-2.11.3.0
  ---

 - Bugfixes.
 - Instance-related internal changes. Instanced service directories
need to be recreated with the new version of s6-instance-maker.
 - New s6-svc -Q command, instructing s6-supervise not to restart the
service when it dies (like -O) and to additionally create a ./down file
in the service directory.
 - s6-ioconnect will now always shutdown() socket endpoints at EOF time;
the -0, -1, -6 and -7 options are still supported, but deprecated.

 https://skarnet.org/software/s6/
 git://git.skarnet.org/s6


* s6-rc-0.5.4.0
  -

 - Bugfixes. In particular, s6-rc-update now conserves the existing
instances in an instanced service, whether the service is currently
active or not. In case of a live update, the current instances keep
running, but will restart with the new template next time they die
(which can be forced by a s6-instance-control -r invocation).
 - New s6-rc subcommands: start and stop, equivalent to "-u change"
and "-d change" respectively.

 https://skarnet.org/software/s6-rc/
 git://git.skarnet.org/s6-rc


* s6-linux-init-1.1.0.0
  -

 - s6-linux-init-maker: -U option removed. No early utmpd script is
created. The reason for this change is that distros using utmps need
stage 2 utmp services anyway (because the wtmp database needs to be
persistent so wtmpd and btmpd can only be started after a log filesystem
has been mounted), so utmp is unusable before that point no matter what.
Distros should have an utmpd service started at the same time as wtmpd
and btmpd; so utmp management goes entirely out of scope for
s6-linux-init.

 https://skarnet.org/software/s6-linux-init/
 git://git.skarnet.org/s6-linux-init


* s6-portable-utils-2.3.0.0
  -

 - s6-test removed, hence the major update.
 - New --enable-multicall configure option.

 https://skarnet.org/software/s6-portable-utils/
 git://git.skarnet.org/s6-portable-utils


* s6-linux-utils-2.6.1.0
  --

 - s6-mount option support updated.
 - New --enable-multicall configure option.

 https://skarnet.org/software/s6-linux-utils/
 git://git.skarnet.org/s6-linux-utils


 Enjoy,
 Bug-reports welcome.

--
 Laurent



[announce] skarnet.org January 2023 release

2023-01-14 Thread Laurent Bercot



 Hello,

 New versions of the skarnet.org packages are available. This release
is overdue, sorry for the delay - but finally, happy new year everyone!

 skalibs' strerr_* functions and macros, meant to provide shortcuts for
error message composition and output, have been rewritten; they're no
longer split between strerr.h and strerr2.h, but are all gathered in
strerr.h - the skalibs/strerr2.h headers is now deprecated.
 This is released as a major version upgrade to skalibs because some
hardly ever used strerr macros have been outright removed; and the
deprecation of strerr2.h also counts as an API change. However, unless
you were using the deleted strerr macros (highly unlikely, as there
was no reason to, which is why they're being deleted in the first 
place),

your software should still build as is with the new skalibs, maybe
with warnings.

 The rest of the skarnet.org software stack has undergone at least a
release bump, in order to build with the new skalibs with no warnings.
Most packages also include several bugfixes, so upgrading the whole
stack is recommended.

 The new version of s6 includes a feature that has often been asked
for: an implementation of dynamically instanced services. Six new
commands allow you to create and manage dynamic instances of a given
service directory template, parameterized by an argument you give
to the run script.
 It also comes with a few quality-of-life changes, such as s6-log
line prefixing, as well as a good number of minor bugfixes.

 The "s6-test" program, formerly in s6-portable-utils, has migrated
to the execline package, where it is named "eltest". It still exists
in s6-portable-utils, but is deprecated and will be removed in a
future release.

 The new versions are the following:

skalibs-2.13.0.0 (major)
nsss-0.2.0.2 (release)
utmps-0.1.2.1(release)
execline-2.9.1.0 (minor)
s6-2.11.2.0  (minor)
s6-rc-0.5.3.3(release)
s6-linux-init-1.0.8.1(release)
s6-portable-utils-2.2.5.1(release)
s6-linux-utils-2.6.0.1   (release)
s6-dns-2.3.5.5   (release)
s6-networking-2.5.1.2(release)
mdevd-0.1.6.1(release)
smtpd-starttls-proxy-0.0.1.2 (release)
bcnm-0.0.1.6 (release)
dnsfunnel-0.0.1.5(release)

 Details of some of these package changes follow.


* skalibs-2.13.0.0
  

 - Bugfixes.
 - New functions: buffer_timed_put, buffer_timed_puts, for synchronous
writes to a file descriptor with a time limit.
 - strerr2.h deprecated. strerr.h entirely revamped. Every existing
strerr interface is now a variable argument macro around the new
strerr_warnv, strerr_warnvsys, strerr_diev and strerr_dievsys functions,
which just prints arrays of strings to stderr. This reduces the amount
of adhocness in the strerr code considerably, allows calls without an
upper bound on the number of strings, and should save some bytes in
resulting binaries.

 https://skarnet.org/software/skalibs/
 git://git.skarnet.org/skalibs


* execline-2.9.1.0
  

 - Bugfixes.
 - New program: eltest. This is the program formely available in
s6-portable-utils as "s6-test", that has changed packages and be 
renamed.

It's a quasi-implementation of the POSIX "test" utility, that was too
useful in execline scripts to be off in a separate package. (Quasi
because the exact spec is bad.) It understands -v, for testing the
existence of a variable, and =~, for regular expression matching.

 https://skarnet.org/software/execline/
 git://git.skarnet.org/execline


* s6-2.11.2.0
  ---

 - Bugfixes.
 - The one-second service restart delay can now only be skipped when the
service is ready. This prevents CPU hogging when a heavy service takes
a long time to start and fails before reaching readiness.
 - The name of the service is now passed as the first argument to ./run
and as the third argument to ./finish.
 - s6-log now understands a new directive: p. "pfoobar:" means that the
current log line will be prepended with the "foobar: " prefix. This
allows service differentiation in downstream log processing, which was
an often requested feature.
 - New commands available: s6-instance-maker, s6-instance-create,
s6-instance-delete, s6-instance-control, s6-instance-status,
s6-instance-list. They allow you to manage supervised sets of services
created from the same templated service directory with only a parameter
(the name of the instance) changing.

 https://skarnet.org/software/s6/
 git://git.skarnet.org/s6


* s6-portable-utils-2.2.5.1
  -

 - s6-test is now deprecated, replaced with the eltest program in the
execline package.

 https://skarnet.org/software/s6-portable-utils/
 git://git.skarnet.org/s6-portable-utils


 Enjoy,
 Bug-reports welcome as always.

--
 Laurent



Re: s6 vs shell wrappers for client connections

2023-01-11 Thread Laurent Bercot

if s6-svstat myserver; then
  client_binary
else
  send_email_to_admin
  faux_client_binary
fi


 Please don't do this.

 - As Steve would be able to tell you if he had read the documentation
page for s6-svstat (https://skarnet.org/software/s6/s6-svstat.html),
"s6-svstat myserver"'s exit code does not mirror whether myserver is
running or not. If myserver is a live service directory, s6-svstat
will print a line to stdout summarizing the status of the service,
then exit 0. Zero, even if the service is down, so client_binary
will fail.
 s6-svstat will only exit nonzero if an error occurs or if no
supervisor is running on myserver, which is not what this script
aims to test.

 - It is good to notify the admin when something on the system is not
working correctly. Doing it automatically from a script, however, is
not a good idea. Spamming the admin's mailbox via automated messages
is an great way to get yourself ignored at best (which defeats the
purpose of reporting bugs) or banned at worst.

 - If you're using s6, as I said earlier, you're supposed to be able
to assume that myserver is running, and you don't have to write
such wrappers, even ones that would work.

--
 Laurent



Re: s6 vs shell wrappers for client connections

2023-01-11 Thread Laurent Bercot

You have a program that can be started normally or as a service
that accepts connections through a socket. For client
connections, an additional binary is supplied.

The simplest way to make sure that the program launches
regardless of whether there's a server running or not is a
wrapper script that executes the right binary based on socket's
availability.


 In a vacuum, for a generic client, yes.
 On a machine you control, there is a simpler way: by policy, you
should *know* if a server is available or not.

 s6 won't help you much with making a client that works in uncertain
situations. What it will help you with, however, is reduce the unknowns
on a machine you use it with. In this case, it can ensure that you
*know*, by policy, that a given server is running, so you can assume
its presence - and if the assumption is wrong, it's a bug, and the
machine's admin should fix it.



Example of this is the 'foot' terminal emulator - it has 'foot
[--server]' and footclient binaries.  How, if at all, could s6
help remove this executable ambiguity, the need for checking and
wrapping?
(...)
To continue with the example: I set up 'foot' as described above.
The result is that s6-svscan/supervise starts the 'foot' server,
places an 'S' socket in the service directory and sits there.  I
can connect to the server by pointing to socket in service
directory
$ footclient -s "$s6-foot-servdir/S"

This however, still requires me to check for the socket and
if-else the binary and options each time I want to start the
program, doesn't it? Does s6 offer a remedy?


 If you're running a machine with an s6-supervised foot service, then
the point of it is that the foot server is always running. The
supervision tree will start it at boot time, and maintain it through
the machine's lifetime: so, the footclient invocation is always the
correct way to run a client. You don't have to plan for the case
where there's no server: if there's no server, it's a bug, and outside
the responsibility of the client to plan for.

 If you don't want to have a server permanently listening to a socket,
then don't add a supervised service; but in this case, you will have
to deal with the unknown state of your machine, and script your client
so it spawns a server as needed. Needless to say, I think that's a
technically inferior solution. ;)

(We had a good IRC discussion not long ago about the merits and
drawbacks of on-demand server spawning. My point of view is that
on-demand isn't nearly as useful as people pretend it is, because you
provision a server for max charge anyway, and idle services do not
consume resources - so on-demand spawning does not increase the
*power* of your setup, it only increases its *flexibility*, which is
very overrated on most modern setups which are, for better or worse,
built for a fixed set of tasks.)

 Hope this helps,

--
 Laurent



Re: [PATCH] Fix documentation typos

2023-01-06 Thread Laurent Bercot

-file named S6_FDHOLDER_STORE_REGEX is found is the env/
+file named S6_FDHOLDER_STORE_REGEX is found in the env/


 Applied, thanks.



-service readiness, you should give this option along with up: the 
service is ready iff
+service readiness, you should give this option along with up: the 
service is ready if


 Not a typo. "iff" means "if and only if", as any mathematician will
tell you. :)

--
 Laurent



Re: ca catchall logger prefix log lines with service names?

2022-10-25 Thread Laurent Bercot

I am trying to figure out if I can set up svscan catchall logger in such
a way that it prepends a service name to every log line, so that it can
be clear where the log came from.

I am trying to avoid s6-rc setup where I need to explicitly create a
matching logger service.


 You are saying "I want to use s6 without adopting s6 idioms because
I'm used to the way other supervisors do things, even when they're
giant hacks." :)

 It's much easier to have one dedicated set of logs per service, and
merge them later if that's what you want, than to have one big blob of
logs and separate them later for analysis. In other words: "cat | sort"
is cheaper than a series of "grep".

 The catch-all logger is exactly what it says: a catch-all. It only
gets the logs that are not diverted; it's not supposed to get
*everything* - although it can, if that's what you want, but then as
you noticed you get everything as is: output from the services go
directly to the catch-all logger. There is no intermediary process
that can prepend your log lines with information, because the services
have their output directly plugged into the pipe to the catch-all 
logger.


 The only way to modify the log lines is to insert an additional
process between your service and the catch-all logger that performs
the modification. And the simplest way to achieve that is to have
a foo/log service (or a foo-log consumer, if you're using s6-rc) for
every foo producer, that processes its stdin and writes the modified
lines to stdout, something like

  s6-format-filter "foo: %s"

 But that's still defining a consumer for your service. There is,
unfortunately, no way to do that without a consumer for every service.
And if you're going to add consumers anyway, you may as well write
dedicated loggers directly instead of line processors that send
everything back to the catch-all logger.

 Note that it's not s6 being difficult; it's s6 being out of the way
by default, by not touching your logs. If another supervisor allows
you to modify log lines by default, it means it's inserting itself
between the service and its logger, which is terribly inefficient.



But creates a log file where all logs from all services are mixed up and
are impossible to distinguish. There is already a strange looking thing
prepended to every line (@40006356da4b2cb3ba0a - a timestamp?)


 Yes, it's a TAI64N timestamp. Process the file through s6-tai64nlocal
to get human-readable timestamps. One of the advantages of TAI64N stamps
is that they're alphabetically sorted, so you can merge several log
files into one via "cat log1 log2 log3... | sort".

--
 Laurent



Re: Disconnect between s6-svscan and s6-rc

2022-10-25 Thread Laurent Bercot




The user of s6-rc gets no error message, and waits forever.
The error message is captured by s6-svscan (or a corresponding logger
for that service) and is either saved into a log file, or printed to a
tty on which svscan is running.
The user is almost never on the same tty with svscan. The user never
gets an error message from the service, unless they explicitly know
where to look.


 There's no "tty on which svscan is running". s6-svscan should not
run in a terminal. You *can* do it, but it's fragile: it means your
whole supervision tree is vulnerable to a ^C in a terminal.
 The output of s6-svscan, and the supervision tree by default, should
go to a catch-all logger and potentially to the system console.



It would be nice if a user can get a temporary view into the service's
stdout for the short duration of s6-rc execution? Such that when s6-rc
never exits waiting we can see what the error message that service
prints?


 For oneshots, it is the case: the output of oneshots will go to the
output of s6-rc.
 Unfortunately, for longruns, that's pretty difficult. One of the
points of supervised longruns is that they're always started with the
exact same environment, and that includes their descriptors. That means
they're outputting to the same place whether or not s6-rc is running.
In order to temporarily display error messages from longruns, s6-rc
would have to find out where the catch-all logger is, and display it
to the user while it's running. That's impossible while s6-rc is
independent from s6 policy.

 Unified policy for better interaction between tools in the toolset is
one of the goals for s6-rc v1, which is still a long ways away.

 For now, my advice would be to always use a timeout when running s6-rc
(you will have an error message if it times out), and/or to have the
supervision tree's output on the system console, and have the reflex to
check the console when you feel that s6-rc is taking more time than it
should. Sorry I don't have any better immediate solution.

--
 Laurent



Re: s6-rc as user service manager

2022-10-17 Thread Laurent Bercot




Thanks Peter, this was actually helpful and enchanced my mental model.
I think I get get away for now with a user's tree rooted in the system
tree. My graphics environment (sway) can start necessary services
when it is started.


 Yeah, it's a recurring discussion on the IRC channels, and my answer
is always that "user services" as systemd does them aren't a well-
defined concept - "logging in" doesn't always have a clear meaning:
do sshd logins count? Or only seat sessions? What happens when the same
user logs in via 2 different seats and 1 sshd, then logs out of one
seat? then of the second seat? Should the services be stopped, and if
so, when?
 systemd has to make choices and makes things work, more or less, in the
common case of one user at one seat - but that's a very unsatisfying
answer from a developer's point of view.

 s6 users are also more likely to log in remotely more often than
systemd users, so maybe systemd's choices aren't the best ones for s6.

 "User services" are a can of worms, and since I'm always very reluctant
to enforce policy on users or make choices that will not work in all
cases, it's not one that I'm willing to open beyond what's provided by
s6-usertree-maker.

 I'm happy that you can work with a permanent user tree - that is a
well-defined concept that can be implemented and wholeheartedly
supported with s6.



By testing I meant checking if the directory has an active process
watching it. I believe there is a function in skalibs  fd_lock [1]
that svscan uses to check if another svscan runs there. I think it is
just a matter of exposing that function as standalone executable.


 There are no executables to test whether s6-svscan or s6-rc are
running on a given directory, because these are not dynamic properties.
By policy, decided by you or your distro, you should *know*, at all
times, whether a given directory is a scandir with an s6-svscan running
on it - or whether a given directory is a livedir with s6-rc running
on it.
 If you think a given directory should have an s6-svscan running on it,
then you're right; ensure that s6-svscan is started at boot time, and
write your scripts assuming that it's there. If something fails because
it's not there, that's a bug or a system problem, and needs to be fixed,
not accommodated by your scripts.

--
 Laurent



Re: s6-rc user experience question

2022-10-17 Thread Laurent Bercot

 Perhaps a higher-level orchestration tool(s) is/are needed, that
will accomplish most typical workflows like: (...)


 These are all valid points, and things that ultimately s6-frontend,
the future UI over s6/s6-linux-init/s6-rc, aims to solve.
"Higher-level interface" is the (now) #1 feature request for the s6
family of tools, so be assured I've heard it. :)

 No promises on the date of delivery, but it's definitely on the radar.

--
 Laurent



Re: s6-rc user experience question

2022-10-17 Thread Laurent Bercot




Perhaps I can offer a few suggestions how to improve usability:
 - combine compile + svscan on empty dir + init steps into one, like
   `s6-rc init source_dir` and it does those steps under the hood.


 No, because these are operations that are ideally done at different
times.
 - Compilation happens "offline", before you (re)boot the machine.
 - s6-svscan is run independently from s6-rc, typically very early
during the boot sequence. (Ideally it's process 1.) In any case the
preexistence of a supervision tree is assumed by s6-rc, so spawning
one as part of an s6-rc command isn't good workflow.
 - Initialization happens when the service manager is ready to run,
which means when the machine is online and already has an s6-svscan
running.

 Combining these operations is only useful if you're doing things in
a very convoluted, suboptimal way.



 - maybe instead of creating file based database take on sqlite as a
dependency and keep compiled + live state there? sqlite is
   ubiquitous and very light weight. It can save the trouble of
   inventing our own compiled database formats and folder strucutres


 The format of the compiled database is entirely orthogonal to the
usability. Adding a dependency to an external package would do exactly
nothing to improve the UX here; it would just make the code bigger,
slower, more complex, harder to maintain and less secure. And that's
without mentioning that a relational database is not at all the right
structure for the kind of data that s6-rc deals with.

 Not a great trade-off, if you ask me.

 However, there's a change I can make that would immediately improve
s6-rc's usability: since it's largely a perception issue, changing
the vocabulary from "s6-rc-compile" to "s6-rc-snapshot" and
"compiled database" to "opaque snapshot of the set of services" would
be a good idea. It would spook fewer people, and send fewer people
into red herring trains of thoughts. :)



What are your plans / thoughts on s6-rc redesign?


 It's ongoing. It's difficult, and I always find a myriad of things to
do to distract me from working on it, so it's going (very) slowly. But
it's always lurking in the back of my mind, and there isn't a day that
I don't wonder what I have gotten myself into.

 The initial call for funds is still readable here:
 https://skarnet.com/projects/service-manager.html

--
 Laurent



Re: s6-rc user experience question

2022-10-16 Thread Laurent Bercot




1. First we create 'scandir1', and put services there. Each service is a
  svcdir. We put dependencies file and type file in each svcdir.
  (We do not run svcscn on it, because it doesn't really manage
  dependencies)


 That's a bad reason. :)
 The real reason why you don't run s6-svscan on a source s6-rc directory
is because it's not a suitable scan directory. It doesn't contain
service directories: it contains s6-rc definition directories, which
may be for longruns (and in this case *look like* service directories
but they're not, for instance the "run" file doesn't have to be
executable), but also for oneshots or bundles, and s6-svscan would not
know how to deal with those. And they have dependency information,
which s6-svscan cannot interpret, etc.

 Even if their formats are similar, a source directory and a scan
directory are not the same and aren't used for the same purposes.



2. We run s6-rc-compile pointing at scandir and get 'compiled' dir as
  output.

3. We run svcscan process on an emtpy dir - 'scandir2'

4. We run s6-rc-init , feeding 'compiled' and 'scandir2' dirs  we
  get 'live' dir.

At this point things seem to be working and I can use s6-rc to bring up
and down services with dependencies.


 That is indeed the correct way to proceed.



 But this gets very confusing and
does not look like a good user experience:


 I agree it's confusing, and one of the reasons (but not the main one)
why s6-rc needs a redesign: you currently get access to all the internal
workings of the service manager, even when they're not relevant to you,
which is not ideal.



So same information is duplicated 3 times and symlinked 3 times.
Is this the intended flow? Or have I messed something up really badly?


 It's the intended flow, and the information duplication is necessary.

 - The source directories are where you, the user, edit the information.
It's your working copy of it. You should be able to change it as you
wish; the modifications you make should not impact the system.

 - The compiled directory is where the information is stored when
s6-rc-compile validates your source directories. It's an automation-
approved snapshot of the state of your source directories at the time
you called s6-rc-compile. It is immutable: neither you, nor the system
can modify it - and you can store it on a read-only filesystem. The
point is that once you have it, and it works once, it will *always* 
work.


 - The live directory is the working copy of the *system*. It manages
what is going on *right now* on your machine. Some of it is symbolic
links to a compiled directory, when the system only needs to read
information; but some of it is state that can be changed by the system
and that needs to be a copy. That includes longrun service directories,
that maintain running state, which is why these need to be copied
from the compiled directory.

 Having this duplication is the only way of ensuring that your
modifications do not change the snapshots you take when compiling and
do not impact the current machine state, and also that the operation
of your current services does not impact the set of services you're
booting on (which could lead to failed boots).

 Other service managers do not make the distinction between the
user working set, the system working set, and immutable snapshots; that
is a big issue I have with them, because it's difficult to make them
safe.

 Hope this helps,

--
 Laurent



Re: "Back off" setting for crashing services with s6+openrc?

2022-10-10 Thread Laurent Bercot




To me this seems like a relevant improvement that would catch a problematic 
edge case issue.


 I pushed such a change to the s6 git. A new numbered release should
be cut soon-ish.

--
 Laurent



Re: ftrig pipe naming convention

2022-10-04 Thread Laurent Bercot



 (I was on vacation, sorry for the delayed answer.)


Could you please elaborate on the possible race condition? This is simply for 
curiosity and educational purposes. It feels like a
lot of thought was put into s6 codebase, and a lot of ideas are not
immediatedly obvious for people not intimately familiar with OS
interface.


 When you want to listen to notifications, one of the first questions
that come to mind is: what happens if a notification comes before I
start listening to it? How do I make sure I don't miss anything?

 That's the primary race condition for notifications, and the reason
why a simple tool such as s6-ftrig-wait can only be used in the simplest
of settings: when you run s6-ftrig-wait, what you get is notifications
from the moment you run it, and you don't know what happened before.

 The answer is synchronization between the listener and the sender.
In order to make sure the listener misses nothing, *first* the listener
starts listening, *then* the sender is run and can notify the listener.
That's how s6-ftrig-listen1 works: the rest of its command line is
spawned *after* it has made sure there's a fifo listening in the
fifodir, and that command line is supposed to be something that tells
the sender that it's okay to start sending notifications now.

 ftrigr_subscribe() is the primitive that makes a program listen to
notifications in a fifodir, and returns control to the client. It is
important because it is asynchronous: notifications will be read and
processed as soon as ftrigr_subscribe() returns, and the client can do
whatever it needs to do, such as read a state or prime the notification
sender, and then handle the notifications in its own time by calling
ftrigr_update(). The fact that you can do something between subscribing
and handling the notifications is fundamental, and what makes the
model strictly more powerful than "cat < $fifo".

 Internally, it's the s6-ftrigrd helper program, spawned by the
ftrigr_startf() primitive, that performs the system calls in the
fifodirs the client is subscribed to, filters notifications according
to regexps, and sends the relevant notifications to the client; it
is necessary to have an external program doing that, in order to save
a lot of menial work from the client and avoid forcing it into a given
asynchronous programmation model. s6-ftrigrd hides all the low-level
details from the client and allows the ftrigr library to remain usable
in a variety of programming models.

 ftrigr_subscribe() simply tells s6-ftrigrd to open a new fifo in a
fifodir, and waits for an answer code. If it gets a successful answer
from s6-ftrigrd, it means the fifo is open and from now on every
notification will be correctly received and processed. The client can
then proceed to the operation that can cause notifications to be sent.
s6-ftrig-listen1 runs ftrigr_subscribe(), *then* spawns the rest of
its command line - that's how race conditions are avoided.

 In the case of supervision, this is used to track the state of a
service. When a command such as s6-svwait wants to wait until a
service is in a given state, *first* it runs ftrigr_subscribe(),
*then* it looks at the current registered service state (in the
supervise/status file), and then it computes the new service state
according to the data it receives from s6-ftrigrd. There is no race
window during which s6-svwait would have read the status file but not
be reading notifications yet, which would risk missing state changes.

 That is the main race condition that the ftrigr library solves.

 Now, additionally to that, there is another, less serious race
condition that is more directly related to what you were asking about,
with directly creating fifos in fifodirs.

 The "send a notification" primitive is ftrigw_notify() (with its close
family members for various details). It will open all fifos in a fifodir
that match the hardcoded name pattern in succession, and write a byte
to them. Normally, this write succeeds: there's a s6-ftrigrd reader
behind each one of these fifos - and anything else means there's a
problem. Most likely, it's a benign problem, such as a stale fifo:
s6-ftrigrd was killed before it had the chance to clean up, so there's
a useless, unused fifo lying around.
 ftrigw_notify() will then report, via its return code, that there was
a problem; and just in case it has the rights to do so (which is most
of the time), it tries and unlink the stale fifo, which cleans things
up for the next notification, and makes a manual invocation of
s6-cleanfifodir unnecessary. Simple and efficient.

 The Unix mkfifo() (or mknod()) system call just creates the fifo in
the filesystem. It does not open it. In order to open a fifo to read
on, you need to mkfifo() then open(). See where I'm going here?
If you "mkfifo $fifo" then "cat < $fifo", and a notification just
happens to arrive in between the two, ftrigw_notify() will see a fifo
that has been created but is without a reader, and assume it's stale.

Re: "Back off" setting for crashing services with s6+openrc?

2022-09-30 Thread Laurent Bercot



 I feel like this whole thread comes from mismatched expectations of
how s6 should behave.

 s6 always waits for one second before two successive starts of a
service. This ensures it never hogs the CPU by spamming a crashing
service. (With an asterisk, see below.)

 It does not wait for one second if the service has been started for
more than one second. The idea is, if the service has been running for
a while, and dies, you want it up _immediately_, and it's okay because
it was running for a while, and either was somewhat idle (in which case
you're not hurting for resources) or was hot (in which case you don't
want a 1-second delay).

 The point of s6 is to maximize service uptime; this is why it does
not have a configurable backoff mechanism. When you want a service up,
you want it *up*, and that's the #1 priority. A service that keeps
crashing is an abnormal condition and is supposed to be handled by the
admin quickly enough - ideally, before getting to production.

 If the CPU is still being hogged by s6 despite the 1-second delay
and it's not that the service is running hot, then it means it's
crashing while still initializing, and the initialization takes more
than one second while using a lot of resources. In other words, you
got yourself a pretty heavy service that is crashing while starting.

 That should definitely be caught before it goes in production. But
if it's not possible, the least ugly workaround is indeed to sleep
in the finish script, and increasing timeout-finish if needed.
(The "run_service || sleep 10" approach leaves a shell between
s6-supervise and run_service, so it's not good.)
./finish is generally supposed to be very short-lived, because the
"finishing" state is generally confusing to an observer, but in this
case it does not matter: it's an abnormal situation anyway.

 There is, however, one improvement I think I can safely make.
 Currently, the 1-second delay is computed from when the service 
*starts*:
if it has been running for more than one second, and crashes, it 
restarts

immediately, even if it has only been busy initializing, which causes
the resource hog OP is experiencing.
 I could change it to being computed from when the service is *ready*:
if the service dies before being ready, s6-supervise *always* waits for
1 second before restarting. The delay is only skipped if the service has
been *ready* for 1 second or more, which means it really serving and
either idle (i.e. chill resource-wise) or hot (i.e. you don't want to
delay it).
 Does that sound like a valid improvement to you folks?

 Note that I don't think making the restart delay configurable is a good
trade-off. It adds complexity, size and failure cases to the 
s6-supervise

code, it adds another file to a service directory for users to remember,
it adds another avenue for configuration mistakes causing downtime, all
that to save resources for a pathological case. The difference between
0 second and 1 second of free CPU is significant; longer delays have
diminishing returns.

--
 Laurent



Re: ftrig pipe naming convention

2022-09-18 Thread Laurent Bercot




I wonder what is the reason behind the naming convention? What is the
downside of simply writing to any present fifo file ?


 It could work like you're suggesting. But :

 - checking the type of a file is an additional fstat() system call
 - there may be reasons in the future to store other files in the
fifodir that do not receive the event
 - it is nice to detect stale fifos, if any, and delete them as soon
as you can (#L39), and you don't want to delete unrelated files
 - but most importantly: creating a fifo in a fifodir that allows you to
receive events without a race condition, which is the whole point of the
ftrig library, is slightly more complex to do safely than just "mkfifo
event/foobar", and I don't want people to think that this is the API.
No, the API is ftrigr_subscribe(), and everything under it is
implementation details. Restricting the naming is a way of ensuring
(as much as possible) that the fifos were indeed created by the
appropriate programs.

 Don't create fifos willy-nilly in a fifodir, and since you found the
naming convention, don't use it to work around the check to create your
fifos outside of ftrigr_subscribe(). If you do, it will work, until the
time when it doesn't, and it will be a complete PITA to debug.

--
 Laurent



Re: Trouble starting services on boot with s6-rc

2022-08-20 Thread Laurent Bercot




I am a bit ashamed to admit I cannot find the logs. From reading 
https://wiki.gentoo.org/wiki/S6_and_s6-rc-based_init_system#logger I thought 
maybe I should be looking for file /run/uncaught-logs but could not find any 
such file in my docker instance(I understand, docker is not Gentoo).


 By default, s6-overlay containers don't have a catch-all logger, so
all the logs fall through to the container's stdout/stderr.
 You can have an in-container catch-all logger that logs to
/run/uncaught-logs if you define S6_LOGGING to 1 or 2.



While the docs did speak a lot to the directory structure used by s6, I still am finding 
it quite hard to figure out what the default directories are for some things. (e.g. I was 
clear on where my uncompiled s6-rc service directories should go but they seemed to 
"magically" get complied on boot and show up in a scan_dir)


 That's a thing with s6, and even s6-rc: it does not define policies, 
but
only mechanism. In other words: it lets you place things wherever you 
want,

there's no default.
 Of course, higher-level software that uses the s6 bricks needs to 
define

policies in order to get things done; that's why s6-overlay is a set of
scripts putting directories in certain places and calling s6 binaries
with certain arguments and options.
 s6-overlay uses s6-rc-compile under the hood, so it compiles your
database for you at boot time, to keep things as simple as possible 
(even

though it's not the optimal way of doing it).




One additional item. As seems like not a great idea to smash the timeout for 
all services. Is there any way to adjust it on a per service basis? If not
consider me a +1 to kindly add it to a wishlist somewhere.


 You can define timeout-up and timeout-down files in your s6-rc source
definition directories, they do just that. :)



n.b. You should put up a BTC lightning "tip bucket" somewhere :)


 Thank you. Please see this Twitter thread:
 https://twitter.com/laurentbercot/status/1209247040657674240

--
 Laurent



Re: Trouble starting services on boot with s6-rc

2022-08-20 Thread Laurent Bercot

Hoping this is the right place to ask for some help as I am very new to s6 and 
not well versed on any init system.


 s6-overlay questions are normally asked in the "Issues" section of the
s6-overlay GitHub repository, but since yours are really s6-rc 
questions,

it's fine :)  (even though the answers are related to s6-overlay!)



I can see that my services are registered with s6-rc
[Aug 19 18:43:04 root@s6 ~]# s6-rc-db list all | grep -e longrun_test -e 
sleeponstart
sleeponstart
longrun_test


 That proves your services are *in the database*, not that they're going
to get started. To make sure your services are started on boot, you
should also check that they're declared in the "user" bundle (that's
the bundle where users declare what services are part of the final
machine state, by s6-overlay policy).

$ s6-rc-db contents user | grep -e longrun_test -e sleeponstart

 But since they *are* started when you boot your container, it means
they're indeed declared there, so that's not the issue. The real issue
is here:


root  34  0.0  0.019252 pts/0S+   18:53   0:00 
/package/admin/s6-2.11.1.0/command/s6-sudoc -e -t 3 -T 4999 -- up 3


 The "-T 4999" part means there's a 5 second timeout on running your
sleeponstart service. And since this service is waiting for 30 seconds
before succeeding, and 30 > 5... it means that it times out after 5
seconds and gets killed, so the transition fails, and longrun_test
(which depends on it) is never brought up.

 s6-overlay-specific section:

 - You should see all that happening in your container's logs - s6-rc
prints informative messages when something fails. Unless you have
changed the S6_VERBOSITY value from the default 2 to something lower.

 - The 5 seconds timeout likely comes from the fact that you have not
modified the S6_CMD_WAIT_FOR_SERVICES_MAXTIME value, which is 5000 by
default. As is written in both the README[1] and the migration guide[2] 
:P

If your real service needs it, you can disable the timeout by adding to
your Dockerfile:
ENV S6_CMD_WAIT_FOR_SERVICES_MAXTIME=0
or you can adjust it to a bigger value than 5000.

 Good luck,

--
 Laurent


[1]: https://github.com/just-containers/s6-overlay/blob/master/README.md
[2]: 
https://github.com/just-containers/s6-overlay/blob/master/MOVING-TO-V3.md




Re: Pattern for multiple subservices and dynamic discovery i.e. VPN

2022-08-18 Thread Laurent Bercot




That would just move 3 components to another level but they are
still needed: scanning existing service directories, diffing between
desired and current state and applying - so creating or removing
directories.


 So, diffing between desired and current state, and applying the
modifications are components of a *service manager*, not a supervision
suite, and it is important to maintain the distinction in order to
avoid scope creep in s6.

 Even when a service is *not* instanced, these components are somewhat
needed; it's just not noticed because their implementation over a
single supervised service is trivial. But it is important to remember
that the job of a supervision suite is to maintain the service in its
current state (up or down), *not* to manage the wanted state or apply
it. (Of course, it does provide tools to perform state transitions
for longruns, but it comes with no policy on when to call these tools.)

 The components you want definitely have their place in s6-rc; but in
the meantime, they can also be scripted on top of regular s6 if you
have a good modelization for implementing instances, which I will add
in the near future.



I see there a problem with multiple dynamic services. I'm not sure
about concurrency behaviour of updating processes in the service
directory. Maybe Laurent can explain problems in that area, if they
exist.


 s6 manages processes and every supervised process needs its own
service directory. There will be as many service directories as
they are instances. (Some components of a template service directory
can of course be reused.) So there's no concurrency issue; however,
the instance management tool I'm thinking of could adopt various
updating methods depending on what you want. Best effort? Clean
shutdown, service replacement, then firing up of the new service's
instances? Rolling upgrade across the instances? These policies all
have their uses.



I'm not sure how complex the supervision itself is - however I would
love to solve the problem without doing supervision on my own. I
thought about your approach as well but it really depends how resilient
an update process is.


 It will definitely be resilient, but there are several ways to 
implement

it, see above.

--
 Laurent



Re: Pattern for multiple subservices and dynamic discovery i.e. VPN

2022-08-18 Thread Laurent Bercot

- we need an scanning component for the desired state of running
 instances (something like 'find /etc/openvpn -name "*conf"')
- we need an scanning component for the current state in process list
- we need a diffing component
- we need a state applier component


 That sounds very much like what is planned for s6-rc v1, so I think
you will like it when it drops - but it won't be in the near future.

 However, I have some ideas for new s6 tools that wouldn't follow this
model directly but would make it easy for users to create and delete
new instance models, and add/remove instances - so your components
could be implemented over these tools by simple shell scripts. I'll
try to work on that soon.

--
 Laurent



Re: Pattern for multiple subservices and dynamic discovery i.e. VPN

2022-08-17 Thread Laurent Bercot


I'm looking for a pattern to solve a problem, where you have to
discover dynamically the services you have to start.

Examples could be VPN configurations, where you discover the
configuration files and start for every file an instance of the VPN
service.


 Hi Oliver,

 Dynamic instantiation is a real pain point - it's an often requested
feature, but it's surprisingly hard to make it work correctly and
safely in a supervision scheme. Supervision works very well in static
environments, but dynamic discovery is at odds with the architecture.

 I have a few ideas to mitigate that and help people create instanced
services. Instantiation is a planned feature of the future s6-rc v1
but it's still a ways away; I am also thinking of adding tools to help
people handle instances with regular s6, amd they may come in the near
future, but there are currently no such helpers, sorry.

--
 Laurent



Re: Be prepared for the fall of systemd

2022-08-04 Thread Laurent Bercot




What do we as a community need to do
to get S6 into a "corporate friendly" state?

What can I do to help?


 "Corporate-friendly" is not really the problem here. The problem is
more "distro-friendly".

 Distributions like integrated systems. Integrated systems make their
lives easier, because they reduce the work of gluing software pieces
together (which is what distros do). Additionally, they like stuff like
systemd or openrc because they come with predefined boot scripts that,
more or less, work out of the box.

 There are two missing pieces in the s6 ecosystem before it can be
embraced by distributions:

 1. A service manager. That's what's also missing from runit. Process
supervisors are good, but they're not service managers. You can read
why here[1].
 In case you missed it, here is the call for sponsors I wrote last year,
explaining the need for a service manager for s6: [2]. It has been
answered, and I'm now working on it. It's going very slowly, because I
have a lot of (easier, more immediately solvable) stuff to do on the
side, and the s6-rc v1 project is an order of magnitude more complex
than what I've ever attempted before, so it's a bit scary and needs me
to learn new work habits. But I'm on it.

 2. A high-level, user-friendly interface, which I call "s6-frontend".
Distros, and most users, like the file-based configuration of systemd,
and like the one-stop-shop aspect of systemctl. s6 is lacking this,
because it's made of several pieces (s6, s6-linux-init, s6-rc, ...) and
more automation-friendly than human-friendly (directory-based config
instead of file-based). I plan to write this as well, but it can only
be done once s6-rc v1 is released.

 Once these pieces are done, integration into distributions will be
*much* easier, and when a couple distros have adopted it, the rest
will, slowly but surely, follow suit. Getting in is the hard part, and
I believe in getting in by actually addressing needs and doing good
technical work more than by complaining about other systems - yes,
current systems are terrible, but they have the merit of existing, so
if I think I can do better, I'd rather stfu and do better.



Here are some ideas:
- easier access to the VCS (git, pijul, etc)


 The git repositories are public: [3]
 They even have mirrors on github.
 All the URLs are linked in the documentation. I don't see how much 
easier

I can make it.

 Note that the fact that it's not as easy to submit MRs or patches as
it is with tools like gitlab or github is intentional. I don't want to
be dealing with an influx of context-free MRs. Instead, if people want
to change something, I'd like *design discussions* to happen on the ML,
between human beings, and when we've reached an agreement, I can either
implement the change or accept a patch that I then trust will be
correctly written. It may sound dictatorial, but I've learned that
authoritarian maintainership is essential to keeping both a project's
vision and its code readability.



- Issue tracking system


 The supervision ML has been working well so far. When bigger parts
of the project (s6-rc v1 and s6-frontend) are done, there may be a
higher volume of issues, if only because of a higher volume of users, so
a real BTS may become an asset more than a hindrance at some point.
We'll cross that bridge when we get to it.



- CI/CD build chain (being careful not to make it too painful to use)


 Would that really be useful? The current development model is sound,
I think: the latest numbered release is stable, the latest git head
is development. The s6 ecosystem can be built with a basic
configure/make/make install invocation, is it really an obstacle to
adoption?

 I understand the need for CI/CD where huge projects are concerned,
people don't have the time or resources to build these. I don't think
the s6 ecosystem qualifies as a huge project. It won't even be "huge"
by any reasonable metric when everything is done. It needs to be
buildable on a potato-powered system!



- "idiot proof" website
- quick start / getting started guide
- easier access (better?) Documentation


 I file these three under the same entry, which is: the need for
community tutorials. And I agree: the s6 documentation is more of a
reference manual, it's good for people who already know how it all works
but has a very steep learning curve, and beginner-to-intermediate
tutorials are severely lacking. If the community could find the time
to write these, it would be a huge help. Several people, myself 
included,

have been asking for them for years. (For obvious reasons, I can't be
the one writing them.)

 Turns out it's easier to point out a need than to fulfill it.

 It's the exact same thing as the s6 man pages. People can whine and 
bitch

and moan for ages saying that some work needs to be done, but when
asked whether they'll do it, suddenly the room is deathly silent.
For the man pages, one person eventually stepped up and did the work[4]
and I'm forever grateful to them; I 

Re: [DNG] Be prepared for the fall of systemd

2022-08-04 Thread Laurent Bercot




I find it symptomatic of the fact that a guy wrote some Rube Goldberg code and a
corporation decided it would be a great idea to spend millions getting the Rube
Goldberg code into many major distros. As far as us running our of road with the
Unix API, systemd solves no problem and offers no improvement that couldn't have
been solved or improved in a dozen different ways, almost all of which would 
have
been more modular and less prone to vendor lock in.


 We know all of this.
 Please keep these posts out of the supervision ML. It's not that we
don't like systemd-bashing, it's that it takes a lot of space and makes
a lot of noise, and I'd like the supervision ML to be laser-focused on
solving technical issues with supervision systems, not be yet another
place where we complain about systemd day in day out. Thanks.

--
 Laurent



Re: runsvdir does not run runsv recursively, as is documented.

2022-08-03 Thread Laurent Bercot

I'm trying to set up services, which are in subdirectories of other services. 
This is supported, according to the second paragraph of the runsvdir man page:

runsvdir starts a runsv(8) process for each subdirectory,  or
symlink  to a directory, in the services directory dir, up to
a limit of  1000  subdirectories,

In my directory service/ I have:
Foo
Foo/run
Foo/bar
Foo/bar/run


 That's not what the man page means. The "subdirectories" are
subdirectories *of dir*.
 runsvdir will understand service/Foo, and service/bar, but it will not
understand service/Foo/bar.

 The exception is if you have a service/Foo/log directory, but that's
handled by runsv, not runsvdir: service/Foo/log is a logger for
service/Foo.

--
 Laurent



[announce] skarnet.org Summer 2022 release

2022-06-14 Thread Laurent Bercot



 Hello,

 New versions of some skarnet.org packages are available.

 skalibs has undergone a major update, mostly to yet again revamp
librandom. This time I am happy with the API and implementation: I 
believe

it finally addresses all the cases in a satisfying way, providing cross-
platform failure-free pseudorandom number generation with options to 
choose

between waiting until the entropy pool has been initialized and possibly
getting less cryptographically secure data if the entropy pool is too
shallow. It wasn't easy to design; it's here at last.

 Compatibility with previous skalibs version is not assured, but apart
from librandom, and one additional function, no other interface has been
modified, so the compatibility breaks are minimal and a lot of software
will still build with this version without needing any modification.

 Most of the rest of the skarnet.org software stack has undergone at 
least
a release bump, in order to build with the new skalibs; a large part of 
it

has also received some changes and fixes. Some packages did not need
changing at all: no release is provided for these, they should keep 
building

with the new stack.

 execline comes with a quality-of-life parser change: backslashes at the
end of lines are now ignored, which makes it possible to directly copy
some multiline commands from shell scripts.

 s6-linux-utils comes with a new utility, rngseed, which is an original
implementation of Jason Donenfeld's seedrng[1]. This is the work that
made it necessary to get librandom right once and for all. With rngseed,
no Linux system should ever have uninitialized entropy pool problems 
ever

again.

 The new versions are the following:

skalibs-2.12.0.0  (major)
utmps-0.1.2.0 (minor)
execline-2.9.0.0  (major)
s6-2.11.1.1   (release)
s6-rc-0.5.3.2 (release)
s6-linux-init-1.0.8.0 (minor)
s6-portable-utils-2.2.5.0 (minor)
s6-linux-utils-2.6.0.0(major)
s6-dns-2.3.5.4(release)
s6-networking-2.5.1.1 (release)
mdevd-0.1.5.2 (release)
dnsfunnel-0.0.1.4 (release)

 Details of some of these package changes follow.


* skalibs-2.12.0.0
  

 - librandom rewritten. random_init and random_finish functions removed.
The new random_buf function, which replaces random_strin), never fails.
It blocks if the entropy pool is not initialized; the new 
random_buf_early

function is the same, but does not block. random_devurandom is now
exported, but should not be needed except in very specific cases 
(rngseed).

 - New functions added: waitn_posix and waitn_reap_posix, openc*_at.
 - readnclose is now exported.
 - openreadnclose_at() now returns an ssize_t, aligning with 
openreadnclose().
You should check your code for any use of openreadnclose_at(), and adapt 
it
to the new API. (Previously it returned a size_t and the user was 
supposed
to assume an error if it didn't fill the entire length of the buffer. 
Now

errors are reported with -1.)
 - Endianness conversion primitives reworked. The nonportability of 
endian.h
and bswap has always been a pain point; the new portable functions in 
skalibs
should now be just as efficient as the system-dependent endian.h 
functions.

 - Added an implementation of the blake2s hash.

 https://skarnet.org/software/skalibs/
 git://git.skarnet.org/skalibs


* utmps-0.1.2.0
  -

 - Nothing to do with the new skalibs; utmps-0.1.2.0 has been available 
for

a while, but was never properly announced. The main feature is that
utmps-wtmpd can now take an argument naming its database file. This is
useful for implementing btmp, one of the numerous idiosyncrasies of 
historic

Linux software.

 https://skarnet.org/software/utmps/
 git://git.skarnet.org/utmps


* execline-2.9.0.0
  

 - Bugfixes.
 - The execlineb parser has been rewritten and its transition table is 
now

documented.
 - The wait command can now wait for *one* of the listed processes, in
addition to its original capability of waiting for *all* of them. It can
also stop waiting after a timeout. The new features can be used even 
when

wait is used in posix mode.

 https://skarnet.org/software/execline/
 git://git.skarnet.org/execline


* s6-linux-init-1.0.8.0
  -

 - The system scandir is now configurable at compile-time via the
--scandir configure option. It is a relative path under the tmpfsdir.
The default is still "service", for a /run/service default scandir.

 https://skarnet.org/software/s6-linux-init/
 git://git.skarnet.org/s6-linux-init


* s6-portable-utils-2.2.5.0
  -

 - s6-test now understands the =~ operator, matching its left argument
against an extended regular expression given as its right argument (this
is originally a GNU bash extension to test).

 https://skarnet.org/software/s6-portable-utils/
 git://git.skarnet.org/s6-portable-utils


* s6-linux-utils-2.6.0.0
  --

 - New command: 

Re: Unprivileged Shutdown?

2022-05-28 Thread Laurent Bercot




I have been using simple privilege escalation to poweroff the machine,
but looking through the source code for s6-linux-init-shutdownd and
friends, it appears the only constraint on interacting with the daemon
is the permissions on run-image/service/s6-linux-init-shutdownd/fifo.

The default appears to be:
600 root root
I've changed it on my system to be:
620 root power
and added my user to the power group.

This seems like the cleanest way to implement unprivileged
poweroff/reboot, but I'm concerned that it's not possible by default.
Is there a better way, or is it just meant to be done manually?


 No, you are correct that it is the right mechanism.

 Allowing unprivileged shutdown is a dangerous operation and should
only be done is very specific circumstances (i.e. when a normal user
has complete seat and console access), so it's not the default and the
mechanism is fairly hidden.
 If there's demand, I can probably write a s6-linux-init-shutdown-perms
program in a future version that would let you specify the user/group
allowed to shutdown, rather than having you manually tinker with the
fifo.

--
 Laurent



Re: s6 xinit replacement?

2022-05-14 Thread Laurent Bercot

Is the purpose of executing setsid() in s6-supervise to allow for the
services to continue beyond the termination of the supervision tree?


 It's actually the opposite: it's to protect the supervision tree
against misbehaved services. :) setsid() makes sure the service is
isolated, and a killpg() or equivalent won't affect anything outside
of it. Of course, it also protects *other* services running under the
same supervision tree.



If that's the case, could there possibly be a flag to disable that,
with the understanding that something like nohup or even s6-setsid would
be necessary to replicate that behavior?  That would enable a non-root
Xorg to be managed by s6.


 The direction s6 has taken is really the opposite: there was such a
flag in earlier versions, but it was causing a lot of special cases
and problems I definitely did not want to deal with.
 The real issue is that a supervision tree should not be run with a
controlling terminal. It's not meant to be run from a logged-in user
process, but as a background infrastructure that's always there. The
whole point of s6 is to make your services *more reliable*; and there
are few things less reliable than a whole tree of processes that can
die on an accidental ^C.

 Because users insisted a lot, there are still accommodations for
killing a whole supervision tree with ^C when s6-svscan has been 
launched
in a terminal. It is a nice feature to have (and although it was by 
design

that services persisted beyond the ^C, it was unintuitive to most users,
so from a pure UI standpoint, killing the entire tree in one go was
better).
 However, I'm not going back on the "each service runs in its own 
session"

thing, because if there's a case for allowing the user who controls
s6-svscan to kill the whole tree at once, there is just no case for
allowing a service running under the tree to impact other services and
the tree itself.

 Despite this, you're right that the pattern of xinit is similar to what
s6 does, and it *is* possible to run Xorg under s6; several users are
doing so, and I hope they will post their setup. (You may have more
luck by asking in the IRC channel, but it would be nice for the setup
to be documented on the mailing-list.) It does not involve running
s6-svscan from your own VT; it involves having a supervision tree
already running (as your user), and starting the Xorg service, e.g.
with a s6-svc -u command, on a given VT, possibly passed to the run
script via a file that your xinit emulation script would fill in
with the output of `tty`.

 If you insist on doing hacky things, you could even probably get away
with something that looks like:

xinit emulation:
#!/bin/sh
tty | sed s:/dev/tty:: > /home/me/.screen-number
s6-supervise /home/me/Xorg &
# small race here that disappears when s6-supervise has already run once
s6-svwait -U /home/me/Xorg
your-X-client
s6-svc -dx /home/me/Xorg

/home/me/Xorg/notification-fd:
3

/home/me/Xorg/run:
#!/bin/execlineb -P
backtick -E screen { cat /home/me/.screen-number }
export DISPLAY :$screen
X -displayfd 3 :$screen vt$screen

-displayfd is used as a notification mechanism, unblocking s6-svwait
when the X server is ready.

 Hope this helps (and I hope users who actually have done something
similar will share their experience),

--
 Laurent



Re: Supervision on the BSD's

2022-04-09 Thread Laurent Bercot

In searching, I found some messages on the Skaware lists about
running s6 as PID 1 on FreeBSD; has that work been published anywhere?
I'm not sure if I want to go so far as replacing PID 1 right out
of the gate, but having some existing service directories would be
nice.

 I have done some experiments and my conclusion was that to replace
pid 1 on FreeBSD, a real s6-freebsd-init package was needed, because
the way the BSDs organize their init and shutdown is radically
different from the way Linux does it, and the conversion is far from
obvious.

 However, you don't need to replace pid 1 to run s6 on a BSD. As
mentioned in https://skarnet.org/software/s6/s6-svscan-not-1.html , you
can start a supervision tree from /etc/ttys, and run your services
under it. It will work like on any other system.

 Quite a few people on the #s6 channel on IRC (OFTC network) are using
s6 on a BSD, so if you're looking for example service directories, and
various tips and tricks, I suggest you join the channel and ask them. ;)



Have I correctly understood how daemons/services work on the BSD's?
If not, what am I missing? Are the daemons included with the
distributions so incredibly stable that they don't need supervision
in order to keep the system functional?


 The BSDs are tightly integrated systems, more than "distributions", and
especially with OpenBSD, daemons are carefully audited and patched so
they are indeed super stable. Which is a very good thing - but because
of that, the BSD community tends to look down on supervision, without
understanding that it has other benefits than auto-restarting crashed
daemons.



Finally, if you wanted to create a router that you could (metaphorically)
put in a closet and forget about for 5 years, what approach would
you take? My initial thought was OpenBSD + s6, but I worry now that
there could be an impedance mismatch between these systems.


 OpenBSD + s6 will definitely work. Just make sure not to get in the
way of how OpenBSD does things; run an s6 supervision tree at boot
time and start your services under it as you see fit, independently from
OpenBSD's rc.

 Since the BSDs don't care for supervision, though, depending on
upstreams it may be difficult to find options for your packaged daemons
that stop autobackgrounding and that are not debugging options. Just a
small practical hurdle, but when it happens it can be infuriating.

--
 Laurent



Re: s6-linux-init and ttys in lxc containers

2022-03-08 Thread Laurent Bercot

s6-linux-init: warning: unable to ttyname stdout: No such device

I suspect this is due to the mechanism described on 
https://github.com/lxc/lxd/issues/1724#issuecomment-194412831, although I’m not 
using LXD, only lxc (which does not have a daemon running as root).


 You're right, it's the exact same issue. If /proc/1/fd/1 does not
point to a valid pts in the container, then ttyname() will be unable to
resolve it and you will get that warning message - and you also will be
unable to transmit your controlling terminal to stage 2 (which is not a
problem in your case).

 The only solution, I'm afraid, is to make sure your pseudo-filesystems
are correctly prepared before you launch your container, which you may
not be able to do without root privileges. Docker can do it; if you
were running lxd there would probably be a way to do it as well. But if
you have no use for a controlling terminal, you can forget about all 
this

and just ignore the warning.

--
 Laurent



[announce] Small skarnet.org update

2022-03-07 Thread Laurent Bercot



 Hello,

 New versions of some skarnet.org packages are available.

 The changes are minor, mostly quality-of-life and small additions 
driven

by the new version of s6-overlay.
 There are bugfixes all around, so users are encouraged to upgrade even
if they're not using s6-overlay.

 The new versions are the following:

skalibs-2.11.2.0(minor)
execline-2.8.3.0(minor)
s6-2.11.1.0 (minor)
s6-portable-utils-2.2.4.0   (minor)
s6-linux-init-1.0.7.3   (release)

- skalibs features better sysdep autodetection when custom compilation 
flags

are provided, and adds an option to envdir_internal() for unlimited-size
variable reading.
- execline adds the -P and -p options to getpid, to get the parent pid 
or

force the default behaviour.
- s6 features world-usable s6-applyuidgid and s6-setuidgid, as well as
a new -L option to s6-envdir for unlimited-size variable reading.
- s6-portable-utils adds a -N option to s6-dumpenv (add a newline after
dumping a variable) for easier reading back via s6-envdir.
- s6-linux-init fixes corner cases when used in containers.

 Enjoy,
 Bug-reports welcome.

--
 Laurent



Re: s6-svscan shutdown notification

2022-02-23 Thread Laurent Bercot




What's the cleanest way to wait on s6-svscan to shut down after issuing of
a SIGTERM (say s6 via-svscanctl -t)?


 Be its parent, and wait for it. :)
 On SIGTERM, s6-svscan will not exit until the supervision tree is
entirely down, so that will work.
 If you're not the parent, then you'll have to wait for a notification
somehow, but that's easy:

 When s6-svscan wants to exit, it doesn't exit right away, but
tries to exec into the .s6-svscan/finish script. So you have a clear
indicator here: when .s6-svscan/finish runs, it means the supervision
tree is down.
 So, for instance, make a finish script that writes a byte in a fifo,
and have your jail shutdown script read on that fifo. Something like:

.s6-svscan/finish:
#!/bin/sh
exec echo > /run/blah/fifo

shutdown script:
#!/bin/sh
...
rm -f /run/blah/fifo
mkfifo /run/blah/fifo
read < /run/blah/fifo &
s6-svscanctl -t /run/service
wait
...

(read on the fifo before running s6-svscanctl, to avoid the small
race condition.)



Looking at the documentation, my only option appears to be to check if the
return code of s6-svscanctl is 100, or maybe to monitor for the existence
of .s6-svscan/control (not sure if it's removed on exit). Are there any
other ways to monitor s6-svscan?


 Ew. Don't poll.
 Use .s6-svscan/finish to do anything you want to do at s6-svscan
death time.

--
 Laurent



Re: s6-log weirdness

2022-02-04 Thread Laurent Bercot

I noticed that in some cases s6-log exits cleanly but does not log
anything. What's worse, it depends on the message content.


 Hi Vallo,

 That's the difference between '!zstd -q' and '!zstd' -q ;)

 When -q isn't a part of your processor command, but a part of the
s6-log command line, it is interpreted as a selection directive,
and will filter anything that contains a 'q' character.

--
 Laurent



Re: [announce] skarnet.org Winter 2021-2022 release

2021-12-22 Thread Laurent Bercot

I think trying to explain s6-linux-init + s6-rc through the lens of
runit's stages isn't a good idea.


 Carlos is correct - both here and in his explanation of s6-linux-init
stages.

 When designing s6-linux-init, I kept runit's "stage" terminology
because at the time it was a useful framework to see where things
should go; but in retrospect, it was probably a mistake, prone to
confusion, as exemplified by Steve's message. Even though, functionally,
the "stages" in runit and s6-l-i have similarities, the way they're
implemented and work under the hood is fundamentally different.

runit's stage 1 and stage 2 are similar, their difference is only
conventional: traditionally, runit runs all the oneshots in stage 1,
and stage 2 is just an execution of runsvdir - but a setup where
/etc/runit/1 is empty and all the oneshots are performed in
/etc/runit/2 prior to the execution of runsvdir would work as well.

s6-linux-init's stage 1 and stage 2 do not have the same similarity.
Stage 1 is the early init, running as pid 1; this is only the
/sbin/init script produced by s6-linux-init-maker, which executes
into the s6-linux-init binary (which sets up the system and execs
into s6-svscan); and users are not supposed to do anything with it.
 Stage 2, on the other hand, is the real boot sequence, running as
not-pid-1; it is only run when stage 1 has completed, which means
the system has an adequate long-running pid 1 process, a supervision
tree, and a catch-all logger - all the basic infrastructure is in
place and the services can be started. With s6-linux-init, stage 2
is where all the real work happens; and when the system's boot
sequence is complete, the stage 2 script simply exits and the
system keeps running until a shutdown command is issued.

 I want to keep the "stages" framing for s6-linux-init, because I think
it is useful: these are qualitatively different parts of the init
process. (There's a "stage 3" and a "stage 4" as well, at shutdown
time: they're handled by the s6-linux-init-shutdownd daemon.) The
framing is actually *more* useful here than in runit, where "stages"
are only sequence points and the only real hardcoded meaning is that
the system shuts down when /etc/runit/3 exits.



 The preceding is the best interpretation I could put together from
https://skarnet.org/software/s6-rc/overview.html,
https://skarnet.org/software/s6-rc/s6-rc.html, and discussions with
 you. What do I need to do to make the preceding sequence accurate?


 I don't know, honestly, Steve. At this point I cannot tell whether
you're acting in good or bad faith.

 You seem to be talking about s6 and s6-linux-init, yet only mention
the s6-rc documentation. You do not seem to have read the
https://skarnet.org/software/s6-linux-init/quickstart.html page,
which explains that s6-linux-init-maker is run offline, or the
https://skarnet.org/software/s6-linux-init/s6-linux-init.html page,
where the "Early preparation" part explains how stage 1 works. You
do not seem to have watched my FOSDEM 2017 video at
https://archive.fosdem.org/2017/schedule/event/s6_supervision/ where
I describe the various duties of an init system and how the components
in the s6 suite fill the boxes.

 Despite your claims to be interested, you have not put in s6 the
tenth of the effort you put in runit. It's been going on for ages.
You say you haven't paid much attention to the progress of s6, but
over the years the fundamentals have not changed, they've been the
same for a while now; the truth is that you have never paid much
attention to s6 at all. You come to the list once in a while and
ask a question that shows you are still lacking a basic understanding
of s6, an understanding that comes from two hours of experimenting
while having a browser open with 3-4 tabs to the documentation.
And then you seem to ignore the answers, and go away until the
next time when you come back just as helpless.

 You are clearly not dumb. So either you are a troll, or you need to
get a grip and realize that if you're really interested, you can do
the work of looking up and finding the relevant documentation,
experimenting, and finally getting the reward of feeling the pieces
of the puzzle fall into place, and acquiring the understanding that
has eluded you for so long and that you seem to crave. As a technical
writer, that is *your job*, so you can make the process easier for
other people.

 s6 is not as complex as you seem to think it is, far from it. There
are real, living people who understand how it works, and they're not
all acne-ridden nerds living in a basement. The documentation may
not be perfect, but it seems to be adequate. It lacks tutorials, yes,
but I expect tutorials to be written by *people like you*, who could
do a much better job of it than I ever would, if only they'd stop
acting like damsels in distress at the first mention of a Unix pipe.

 And if you're not interested, or simply not enough to really get
into it, that's okay too; you just need to own it and 

[announce] skarnet.org Winter 2021-2022 release

2021-12-21 Thread Laurent Bercot



 Hello,

 New versions of all the skarnet.org packages are available.

 The changes are, for the most part, minimal: essentially, the new
versions  fix a bug in the build system that made cross-building under
slashpackage more difficult than intended. Very few people should
have been impacted by this bug.
 Some packages had a few more bugfixes; and some packages have
additional functionality. No major update; no compatibility break.

 The new versions are the following:

skalibs-2.11.1.0 (minor)
nsss-0.2.0.1 (release)
utmps-0.1.1.0(minor)
execline-2.8.2.0 (minor)
s6-2.11.0.1  (release)
s6-rc-0.5.3.0(minor)
s6-portable-utils-2.2.3.4(release)
s6-linux-utils-2.5.1.7   (release)
s6-linux-init-1.0.7.0(minor)
s6-dns-2.3.5.3   (release)
s6-networking-2.5.1.0(minor)
mdevd-0.1.5.1(release)
bcnm-0.0.1.5 (release)
dnsfunnel-0.0.1.3(release)
smtpd-starttls-proxy-0.0.1.1 (release)

 Dependencies have all been updated to the latest versions. They are not
strict: libraries and binaries may build with older releases of their
dependencies, although this is not guaranteed.

 You do not need to recompile your s6-rc service databases. To make use
of the new s6-linux-init functionality, however, you will have to
recreate your run-image.
 You do not need to restart your supervision tree, unless you're 
deleting

your old s6 binaries.

 Details of minor package changes follow.

* skalibs-2.11.1.0
  

 - New function: opendir_at()


* utmps-0.1.1.0
  

 - New binary: utmps-write, a generic utmp client that can write
user-crafted records to the utmp and/or wtmp databases.


* execline-2.8.2.0
  

 - New -s option to the case binary, enabling fnmatch() (shell)
expression matching instead of regular expression matching.


* s6-rc-0.5.3.0
  -

 - Bundle contents are now read in a "contents.d/" subdirectory, one
file per content, instead of one per line in a "contents" file. In
the same way, service dependencies are now read in a "dependencies.d/"
subdirectory, one file per dependency. Old "contents" and "dependencies"
files are still supported, but deprecated. This change allows better
integration of s6-rc service definitions with package managers.


* s6-linux-init-1.0.7.0
  -

 - New -S option to s6-linux-init-maker, forcing a sync on halt even
in a container.


* s6-networking-2.5.1.0
  -

 - SNI wildcarding is implemented, as well as a workaround for a
bearssl bug causing errors on certificate signatures in certain cases.


 Enjoy,
 Bug-reports welcome.
 And happy holidays to you all!

--
 Laurent



Re: A program that can get exactly the log of a supervised process?

2021-10-25 Thread Laurent Bercot




Why not have the grepper listen on the log file directly? You'll need to have a 
timestamp in the log and know where the log is, but those can be known at the 
time of writing the service script.


 There's no such thing as "the log file". There's the log backendS,
which can be one or more automatically rotated log files in a directory,
status files, network backends, etc.
 The only way to access the log stream in a reliable and generic way
is to read it on the way to the backends. You have no control over
what happens later on.

--
 Laurent



Re: A program that can get exactly the log of a supervised process?

2021-10-25 Thread Laurent Bercot

Well, I do realise the lifespan issue of the loggrep program, which is
why I asked the question in the first place.  But I really never thought
of directly inserting loggrep into the logging chain as a new node;
instead, what I have thought is making loggrep a program "attachable" to
the logger.  That is, the logger is extended with a listener which at
least one client can connect to, and which upon connection tees the log
to the client.  I do not know whether you have similar ideas.


 Well in theory you could have something like skabus-dyntee
( https://skarnet.org/software/skabus/skabus-dyntee.html ) as your
logger, and have your "real" logger run as a skabus-dyntee client.
Then you could add a loggrep as a second skabus-dyntee client, and it
would just disappear when it has finished its work.

 It would be a little unreliable as is, because skabus-dyntee doesn't
care if it has no client at all, so if the real logger dies, it won't
block the log stream until the real logger has restarted, so you may
lose logs. But apart from that, it would work.

 A really reliable solution would be having a skabus-dyntee equivalent
that has one permanent output and blocks as long as that output isn't
there. As you say, it would be a logger extended with a listener.

 Another question is how to piggyback loggrep into the notification
mechanism: if loggrep is tied to the logger and not to the service,
it doesn't have native access to the notification pipe. That means a
specific mechanism is needed to give it cross-service access.

 That's definitely a lot of code and a lot of architectural
convolutions to accommodate what is ultimately a daemon misdesign.
But it's probably the least bad way to do it, so I might think about it
more and add something like that to s6 in the distant future.

--
 Laurent



Re: A program that can get exactly the log of a supervised process?

2021-10-24 Thread Laurent Bercot

Any idea on how the log "teeing" may be done cleanly (and portably
if possible; something akin to `tail -f' seems unsuitable because of
potential log rotation), and perhaps any flaw or redundancy in the
design above?


 The obstacle I have always bumped against when trying to do similar
things is that the teeing program always has to remain there, even
after it has done its job (it has read the readiness line and informed
the supervisor). And so, instead of "service | logger", your data
flow permanently becomes "service | loggrep | logger" before
readiness, and "service | cat | logger" after readiness (the best
loggrep can do is exec into cat, or reimplement its functionality,
once it has read its readiness line).

 That wouldn't be a huge performance problem, especially if "cat" can
do zero-copy data, but it is definitely a reliability problem:
 - loggrep must die when the service dies, so a new loggrep can be run
when the service runs again. So loggrep cannot be run as a separate
supervised service in the same service pipeline. (If loggrep were to
restart independently from service, it would need to check whether
service is ready, and run as 'cat' if it is. This is doable, but more
complex.)
 - That means that either the pipe between service and loggrep cannot
be held, or loggrep must have an additional notification that service
died. This is, again, doable, but more complex.
 - If loggrep isn't supervised, and the pipe isn't being held, then
killing loggrep will incur a broken pipe, which means a service restart
with a lost line of logs, which supervision aims to avoid.

 So basically, either loggrep is a simple tee-like program but you
weaken the supervision properties of the service, or the functionality
needs to be embedded in the supervision architecture, with loggrep
being a consumer for service and a producer for logger (which is
easy with s6-rc but not so much with pure s6) and loggrep always
watching the state of service (which is not so easy with s6-rc, where
you don't know the full path to another service directory).

 In short: practical issues. It's impossible to do that in a clean,
satisfying way.

 And it entirely makes sense that it's so difficult, because the very
idea is to use the *data flow* to inform the *control flow*, and that
is inherently dangerous and not planned for in supervision 
architectures.

Making your control flow depend on your data flow is not a good pattern
at all, and I really wish daemons would stop doing that.

 I'm sorry I don't have a better answer.

--
 Laurent



Re: logging services with shell interaction

2021-10-19 Thread Laurent Bercot

we have a fair number of services which allow (and occasionally require) user 
interaction via a (built-in) shell. All the shell interaction is supposed to be 
logged, in addition to all the messages that are issued spontaneously by the 
process. So we cannot directly use a logger attached to the stdout/stderr of 
the process.


 I don't understand the consequence relationship here.

 - If you control your services / builtin shells, the services could
have an option to log the IO of their shells to stderr, as well as
their own messages.
 - Even if you cannot make the services log the shell IO, you can add
a small data dumper in front of the service's shell, which transmits
full-duplex everything it gets but also writes it to its own stdout or
stderr; if that stdout/err is the same pipe as the stdout/err of your
service, then all the IO from the shell will be logged to the same place
(and log lines won't be mixed unless they're more than PIPE_BUF bytes
long, which shouldn't happen in practice). So with that solution you
could definitely make your services log to multilog.



procServ is a process supervisor adapted to such situations. It allows an 
external process (conserver in our case) to attach to the service's shell via a 
TCP or UNIX domain socket. procServ supports logging everything it sees (input 
and output) to a file or stdout.


 That works too.



IOC=$1

/usr/bin/procServ -f -L- --logstamp --timefmt="$TIMEFMT" \
 -q -n %i --ignore=^D^C^] -P "unix:$RUNDIR/$IOC" -c "$BOOTDIR" "./$STCMD" \
 | /usr/bin/multilog "s$LOGSIZE" "n$LOGNUM" "$LOGDIR/$IOC"
```

So far this seems to do the job, but I have two questions:

1. Is there anything "bad" about this approach? Most supervision tools have 
this sort of thing as a built-in feature and I suspect there may be a reason for that 
other than mere convenience.


 It's not *bad*, it's just not as airtight as supervision suites make
it. The reasons why it's a built-in feature in 
daemontools/runit/s6/others

are:
 - it allows the logger process to be supervised as well
 - it maintains open the pipe to the logger, so service and logger can
be restarted independently at will, without risk of losing logs.

 As is, you can't send signals to multilog (useful if you want to force
a rotation) without knowing its pid. And if multilog dies, it broken
pipes procServ, and it (and your service) is probably forced to restart,
and you lose the data that it wanted to write.
 A supervision architecture with integrated logging protects from this.



2. Do any of the existing process supervision tools support what procServ gives 
us wrt interactive shell access from outside?


 Not that I know of, because that need is pretty specific to your
service architecture.
 However, unless there are more details you have omitted, I still
believe you could obtain the same functionality with a daemontools/etc.
infrastructure and a program recording the IO from/to the shell. Since
you don't seem opposed to using old djb programs, you could probably
even directly reuse "recordio" from ucspi-tcp for this. :)

--
 Laurent



Re: Service watchdog

2021-10-19 Thread Laurent Bercot

Yes, in my usecase this would be used at the place where sd_notify()
is used if the service runs under systemd. Then periodically executed
watchdog could check the service makes progress and react if it
doesn't.

The question is how to implement the watchdog then - it could be either
a global service or another executable in service directory, which
would be started periodically by runsv.


 If a single notification step is enough for you, i.e. the service
goes from a "preparing" state to a "ready" state and remains ready
until the process dies, then what you want is implemented in the s6
process supervisor: https://skarnet.org/software/s6/notifywhenup.html

 Then you can synchronously wait for service readiness
(s6-svwait $service) or, if you have a watchdog service, periodically
poll for readiness (s6-svstat -r $service).

 But that's only valid if your service can only change states once
(from "not ready" to "ready"). If you need anything more complex, s6
won't support it intrinsically.

 The reason why there isn't more advanced support for this in any
supervision suite (save systemd but even there it's pretty minimal)
is that service states other than "not ready yet" and "ready" are
very much service-dependent and it's impossible for a generic process
supervisor to support enough states for every possible existing service.
Daemons that need complex states usually come with their own
monitoring software that handles their specific states, with integrated
health checks etc.

 So my advice would be:
 - if what you need is just readiness notification, switch to s6.
It's very similar to runit and I think you'll find it has other
benefits as well. The drawback, obviously, is that it's not in busybox
and the required effort to switch may not be worth it.
 - if you need anything more complex, you can stick to runit, but you
will kinda need to write your own monitor for your daemon, because
that's what everyone does.

 Depending on the details of the monitoring you need, the monitoring
software can be implemented as another service (e.g. to receive
heartbeats from your daemon), or as a polling client (e.g. to do
periodic health checks). Both approaches are valid.

 Don't hack on runit, especially the control pipe thing. It will not
end well.
 (runit's control pipe feature is super dangerous, because it allows a
service to hijack the control flow of its supervisor, which endangers
the supervisor's safety. That's why s6 does not implement it; it
provides similar - albeit slightly less powerful - control features
via ways that never give the service any power over the supervisor.)

--
 Laurent



Re: Readiness notification using signals

2021-09-29 Thread Laurent Bercot



 Hi Carlos,


I'm supervising an instance of an X Server using s6.

X.Org has a built-in readiness notification mechanism: it sends a USR1
to its parent process once it's ready. From what I know, it would be
s6-supervise.


 This...  this is appalling.

 This mechanism is a terrible way to notify readiness.
 1. It forces the program reading readiness information to be the
direct parent of the daemon.
 2. This is not a concern for X, but if the daemon has dropped
privileges, it is very possible that it does not have the rights to
signal its parent anymore.
 3. It forces the parent to use the siginfo_t structure, and more
complex signal management, in order to detect what process is sending
the signal. This can work, but is by no means a common use of signals.
 4. Signals can be lost if several signals of the same value are
received by a process. So if two processes are notifying readiness at
the same time to a unique supervisor via SIGUSR1, it is possible that
the supervisor only gets one of them.



However, after reading the documentation, there doesn't seem to be a
way to set up custom USR1 handlers for s6-supervise.


 Indeed, there isn't. Signals are meant to be a control mechanism from
above, not from below: the administrator is supposed to be able to send
signals to s6-supervise, but the supervised process *definitely is not*.



 As far as I know,
this leaves me with two, not quite ideal solutions: make the run
script spawn X and do readiness notification on its behalf (which
lengthens the supervision chain), or poll it (which is what I've been
doing). Is there a way to not let the information from this USR1 not
"go to waste"?


 Let this "information" go to waste: it should never be sent in the
first place, and if patching were to happen, it should be to delete
that chunk of code from the Xorg server. We should consider ourselves
lucky that s6-supervise is not using SIGUSR1 for its own purposes and
is happily ignoring it as is.

 Fortunately, there is a different solution for you: the -displayfd 
option

to the X command will print the display number, followed by a newline,
to the (argument of -displayfd) file descriptor, when X has found a
display. This only happens when X is ready - unless it only starts
listening to its socket later, but the window should be super small; and
it has the correct format for an s6 notification.
 So, using displayfd as your notification fd will work.
 For instance, if you have 3 in your notification-fd file, starting
X -displayfd 3 should properly (or *almost* properly) notify readiness.
Some people already use that mechanism, and are happy with it. :)

 Good luck,

--
 Laurent



Re: [announce] skarnet.org Fall 2021 release

2021-09-26 Thread Laurent Bercot

just a minor thing.  You probably didn't push to s6-rc repository.
I don't see the new v0.5.2.3 version commit and tag.


 Weird. I must have missed a case with my script.
 Thanks for the report; it should be fixed now. :)

--
 Laurent



[announce] skarnet.org Fall 2021 release

2021-09-26 Thread Laurent Bercot



 Hello,

 New versions of all the skarnet.org packages are available.

 skalibs has undergone a major update, with a few APIs having 
disappeared,
and others having changed. Compatibility with previous versions is  
*not*

assured.
 Consequently, all the rest of the skarnet.org software has undergone
at least a release bump, in order to build with the new skalibs. But
some packages also have new functionality added (hence, a minor bump),
and others also have their own incompatible changes (hence, a major 
bump).


 The new versions are the following:

skalibs-2.11.0.0  (major)
nsss-0.2.0.0  (major)
utmps-0.1.0.3 (release)
execline-2.8.1.0  (minor)
s6-2.11.0.0   (major)
s6-rc-0.5.2.3 (release)
s6-portable-utils-2.2.3.3 (release)
s6-linux-utils-2.5.1.6(release)
s6-linux-init-1.0.6.4 (release)
s6-dns-2.3.5.2(release)
s6-networking-2.5.0.0 (major)
mdevd-0.1.5.0 (minor)
bcnm-0.0.1.4  (release)
dnsfunnel-0.0.1.2 (release)

Additionally, a new package has been released:
smtpd-starttls-proxy-0.0.1.0

 Dependencies have all been updated to the latest versions. They are,
this time, partially strict: libraries and binaries may build with older
releases of their dependencies, but not across major version bumps. The
safest approach is to upgrade everything at the same time.

 You do not need to recompile your s6-rc service databases or recreate
your s6-linux-init run-images.
 You should restart your supervision tree after upgrading skalibs and 
s6,

as soon as is convenient for you.

 Details of major and minor package changes follow.


* skalibs-2.11.0.0
  

 - A lot of obsolete or useless functionality has been removed:
libbiguint, rc4, md5, iobuffer, skasigaction, environ.h and
getpeereid.h headers, various functions that have not proven their
value in a while.
 - Some functions changed signatures or changed names, or both.
 - All custom types ending in _t have been renamed, to avoid treading on
POSIX  namespace. (The same change has not been done yet in other
packages,  but skalibs was the biggest offender by far.)
 - Signal functions have been deeply reworked.
 - cdb has been reworked, the API is now more user-friendly.
 - New functions have been added.

 The deletion of significant portions of code has made skalibs leaner.
libskarnet.so has dropped under 190 kB on x86_64.
 The cdb rewrite on its own has helped reduce an important amount of
boilerplate in cdb-using code.
 All in all, code linked against the new  skalibs should be slightly
smaller and use a tiny bit less RAM.

 https://skarnet.org/software/skalibs/
 git://git.skarnet.org/skalibs


* nsss-0.2.0.0
  

 - Bugfixes.
 - nsss-switch wire protocol slightly modified, which is enough to
warrant a major version bump.
 - _r functions are now entirely thread-safe.
 - Spawned nsssd programs are now persistent and only expire after a
timeout on non-enumeration queries. This saves a lot of forking with
applications that can call  primitives such as getpwnam() repeatedly, as
e.g. mdevd does when  initially parsing its configuration file.
 - New nsssd-switch program, implementing real nsswitch functionality
by dispatching queries to various backends according to a script.
It does not dlopen a single library or read a single config file.

 https://skarnet.org/software/nsss/
 git://git.skarnet.org/nsss


* execline-2.8.1.0
  

 - Bugfixes.
 - New binary: case. It compares a value against a series of regular
expressions, executing into another command line on the first match.

 https://skarnet.org/software/execline/
 git://git.skarnet.org/execline


* s6-2.11.0.0
  ---

 - Bugfixes.
 - Some libs6 header names have been simplified.
 - s6-svwait now accepts -r and -R options.
 - s6-supervise now reads an optional lock-fd file in the service
directory; if it finds one, the first action of the service is to take
a blocking lock. This prevents confusion when a controller process dies
while still leaving workers holding resources; it also prevents log
spamming on user mistakes (autobackgrounding services, notably).
 - New binaries: s6-socklog, s6-svlink, s6-svunlink. The former is a
rewrite of smarden.org's socklog program, in order to implement a fully
functional syslogd with only s6 programs. The latter are tools that 
start

and stop services by symlinking/unlinking service directories from a
scan directory, in order to make it easier to integrate s6-style 
services

in boot scripts for sequential service managers such as OpenRC.

 https://skarnet.org/software/s6/
 git://git.skarnet.org/s6


* s6-networking-2.5.0.0
  -

 - Bugfixes.
 - minidentd has been removed. It was an old and somehow still buggy
piece of  code that was only hanging around for nostalgia reasons.
 - Full support for client certificates. Details of the client
certificate are transmitted to the application via 

Re: First time caller to the show - am I understanding the fifo trick correctly?

2021-08-25 Thread Laurent Bercot

Forgiving privilege separation failures and minor grammatical mistakes, does it 
look as if I understand the fifo trick's application in practice?


 Hi Ellenor,
 Yes, I think you have the right idea.

 The goal here is to redirect s6-svscan's own stdout and stderr to
the stdin of the catch-all logger process, so that the supervision
tree's messages, and the messages from every service that lacks a
dedicated logger, go to the catch-all logger instead of /dev/console.
(Because /dev/console is a terrible default place to send logs and
should only be used for very critical messages such as kernel panics,
or, in our userland case, for catch-all logger failures.)

 The problem is that we want the catch-all logger to run as a service
under the supervision tree, so the s6-log process does not exist yet
when we exec into s6-svscan: it will be spawned later as a grandchild
of s6-svscan (with an s6-supervise intermediary). So we cannot use an
anonymous pipe for this.

 We use a fifo instead: we can redirect init's stdout and stderr to
a fifo, and later on, when the catch-all logger starts, we can
instruct it (in its run script) to read from the fifo.

 But the Unix fifo semantics say that we *cannot* open a fifo for
writing while there is no reader: open() would either block (default
flags) or return -1 ENXIO (with O_NONBLOCK). So the "fifo trick" is:
1. open the fifo for reading
2. open it for writing, which now works
3. close the reading end

At this point, any write() to the fifo will fail with -1 EPIPE. That is
not a problem per se, except it will also generate a SIGPIPE, so in
order to avoid crashing and burning, it is important to ignore SIGPIPE
at the very least - or, better, to make sure that no process writes to
the fifo until the catch-all logger is up. This is the case for 
s6-svscan

and s6-supervise, so our system structure is safe; but we need to make
sure that no other process starts before the catch-all logger is up,
else they will just eat a SIGPIPE and die.

 In the s6-l-i model, s6-svscan is executed as soon as possible, on a
very minimal supervision tree that only contains the catch-all logger
and a few other essential "early services" (such as the shutdown daemon
and an early getty). All the rest of the initialization is done in
"stage 2 init", which is a script run as a child of s6-l-i's.
So the end of the "fifo trick" uses the Unix fifo semantics as a
synchronization mechanism:
4. fork
5. In the child, close our fd to the fifo
6. In the child, open the fifo for writing once again,
   *without* O_NONBLOCK.

 This last open() will block until the fifo has a reader. That
ensures the child will only resume once the parent has completed
its work and executed into s6-svscan, and the supervision tree has
started and the catch-all logger is running. Then the child can exec
into stage 2 init and perform the rest of the work with the guarantee
that the supervision tree is operational and all the stdout and stderr
messages go to the catch-all logger by default.

 To see exactly how to implement stage 1 init and the fifo trick as
an execline script, you can checkout (or download) any version of
s6-l-i *prior to* 1.0.0.0; try version 0.4.0.1, downloadable from
skarnet.org if you type the URL by hand, and accessible via the
v0.4.0.1 tag in git. It is very different from what it is now, as in
there is no sysv compatibility at all, but stage 1 should be
understandable.

 A few months ago, I tried adding a few conditional compilation options
to s6-l-i to make it work under FreeBSD, but unfortunately the
organization of the FreeBSD init is so different from Linux's,
especially shutdown-wise, that my attempt only succeeded in turning
the package into an unholy plate of spaghetti. At some point in the
future, however, a similar-but-separate s6-freebsd-init package may
make sense.

--
 Laurent



Re: s6-rc-init verbose equivalent messages?

2021-08-19 Thread Laurent Bercot

# /usr/local/bin/s6-rc-init -c /s/comp -l /s/run /s/scan
s6-rc-init: fatal: unable to supervise service directories in
/s/run/servicedirs: No such file or directory
I've completed a disk-disk copy, as I need to integrate s6 into
hardenedbsd.


 Do you have a s6-svscan process running on /s/scan ? You need one
before you can run s6-rc-init.

 If you have one, please get a strace of s6-rc-init, if you can:
that will tell exactly what's happening. But this error generally
points to a supervision tree that's not set up properly.

--
 Laurent



Re: S6 Queries

2021-08-11 Thread Laurent Bercot




Thanks Laurent for the detailed explanations. We did a bootup speed
comparison between S6 and systemd. S6 is able to boot up slightly faster
than systemd. Actual result is 4-4.5% faster but we were expecting
something near to 20%.
Ours is a bit complex setup with more than 140 services (includes a lot of
long run services and a lot of dependencies). The main advantage in systemd
is, it starts many critical processes very quickly since it has no
dependency to logging services. We collect the logs from journalctl and
store it in log files. Whereas in S6, the critical services start up is a
bit delayed since it has to depend on logging services which in turn
depends on other services (responsible for backing up the previous logs).


 Thank you for these numbers! Indeed they confirm my intuition: booting
via s6 is a little faster than systemd, but nothing extraordinary,
because s6-rc emphasizes reliability over speed.

 Your critical services may be slightly delayed, but you have the
guarantee that you will not lose logs - whereas with systemd, if a piece
of your logging system fails to start, your critical services may 
already

have produced some data that will vanish into the aether. Whether or not
that's an acceptable risk is up to you.

--
 Laurent



Re: S6 Queries

2021-08-02 Thread Laurent Bercot

Do you think this is any better?

=
#!/bin/sh
test_for_myrequirement || exit 1
exec mydaemon -myarg1 -myarg2
=


 This does not accomplish the same thing at all: it does not ensure
that myrequirement is at least attempted before mydaemon runs. Instead,
it conditions the readiness of mydaemon to that of myrequirement. So,
it is "better" in the sense that it does not control another service
from a run script, but it is even further from what the OP wants.

 Any reference to another service in a  run script is going to be quirky
at best. Managing all kinds of dependencies between services is really
best done *outside* of run scripts, which is why s6-rc exists. It does
not currently have all the expressive power of systemd for dependencies,
but in the future, it will.

--
 Laurent



Re: S6 Queries

2021-08-02 Thread Laurent Bercot

I thought the way to do what the OP asked is:

=
#!/bin/sh
s6-svc -u myrequirement || exit 1
exec mydaemon -myarg1 -myarg2
=


 This is not a good idea in a s6-rc installation, because it sends
raw s6 commands, which may mess with the service state as viewed by
s6-rc. Also, it sends control commands to a service from another
service's run script, which is bad form in general: it is
unintuitive that starting the mydaemon service also causes the
myrequirement service to be started. Dependencies should be handled at
the service manager level, not at the process supervision level.

 Of course, you can do that in a pinch for small s6-only installations,
where you have no proper dependency engine, but that does not seem to be
what the OP is asking.

 And even then, this does not implement After= because s6-svc -u
returns instantly. This only implements Wants=. To implement After=,
you would need something like s6-svc -uwU instead, which is not good
because it adds the myrequirement readiness delay to the mydaemon
readiness delay, so if mydaemon has strict readiness timeouts, it
can make it fail.

 All in all, it's better to avoid controlling another service in a run
script: there's always an annoying corner case somewhere. Dependency
management is best handled by the external tool that is explicitly
supposed to do it: the service manager.

--
 Laurent



Re: S6 Queries

2021-08-02 Thread Laurent Bercot

1. In systemd, the services are grouped as targets and each target depends
on another target as well. They start as targets. [ex: Reached
local-fs.target, Reached network.target, Reached UI target,...]. Is there
any way in S6 to start the init system based on bundles?


 Yes, that is what bundles are for. In your stage 2 boot script
(typically /etc/s6-linux-init/current/scripts/rc.init), you should
invoke s6-rc as:
  s6-rc change top
if "top" is the name of your default bundle, i.e. the bundle that
contains all the services you want to start at boot time. You can
basically convert the contents of your systemd targets directly into
contents of your s6-rc bundles; and you decide which one will be
brought up at boot time via the s6-rc invocation in your stage 2
init script.



2. Are there any ways to have loosely coupling dependencies? In systemd, we
have After=. After option will help the current service to start after the
mentioned service (in after). And the current service will anyway start
even if the mentioned service in After fails to start. Do we have such
loosely coupled dependency facility in S6?


 Not at the moment, no. The next version of s6-rc will allow more types
of dependencies, with clearer semantics than the systemd ones (After=, 
Requires= and Wants= are not orthogonal, which is unintuitive and causes

misuse); but it is still in early development.

 For now, s6-rc only provides one type of dependency, which is the
equivalent of Requires+After. I realize this is not flexible enough
for a lot of real use cases, which is one of the reasons why another
version is in development. :)



3. Is there any tool available in S6 to measure the time taken by each
service to start? We can manually measure it from the logs, but still
looking for a tool which can provide accurate data.


 Honestly, if you use the -v2 option to your s6-rc invocation, as in
  s6-rc -v2 change top
and you ask the catch-all logger to timestamp its lines (which should
be the default, but you can change the timestamp style via the -t
option to s6-linux-init-maker)
then the difference of timestamps between the lines:
  s6-rc: info: service foo: starting
and
  s6-rc: info: service foo successfully started
will give you a pretty accurate measurement of the time it took service
foo to start. These lines are written by s6-rc exactly as the
"starting" or "completed" event occurs, and they are timestamped by
s6-log immediately; the code path is the same for both events, so the
delays cancel out, and the only inaccuracy left is randomness due to
scheduling, which should not be statistically significant.

 At the moment, the s6-rc log is the easiest place to get this data
from. You could probably hack something with the "time" shell command
and s6-svwait, such as
  s6-svwait -u /run/service/foo ; time s6-svwait -U /run/service/foo
which would give you the time it took for foo to become ready; but
I doubt it would be any more accurate than using the timestamps in the
s6-rc logs, and it's really not convenient to set up.



4. Does the S6 init system provide better boot up performance compared to
systemd ?  One of our main motives is to attain better bootup performance.
Is our expectation correct?


 The boot up performance should be more or less *similar* to systemd.
The code paths used by the s6 ecosystem are much shorter than the ones
used by systemd, so _in theory_ you should get faster boots with s6.

 However, systemd cheats, by starting services before their dependencies
are ready. For instance, it will start services before any kind of
logging is ready, which is pretty dangerous for several reasons. As a
part of its "socket activation" thing, it will also claim readiness
on some sockets before even attempting to run the actual serving
processes (which may fail, in which case readiness was a lie.)
Because of that, when everything goes well, systemd cuts corners that
s6 does not, and may gain some advantage.

 So all in all, I expect that depending on your system, the difference
in speed will not be remarkable. On very simple setups (just a few
services), systemd's overhead may be noticeable and you may see real
improvements with s6. On complex setups with lots of dependencies, s6
might still have the speed advantage but I don't think it will be
anything amazing. The real benefit of s6 is that it achieves
roughly the same speed as systemd *while being more reliable and
predictable*.

 If you actually make bootup speed comparisons between systemd and s6,
please share them! I am interested in that kind of benchmarks, and
I'm sure the community would like to see numbers as well.

--
 Laurent



Re: Query on S6 system shutdown

2021-07-29 Thread Laurent Bercot

I believe the finish script is not being called by s6-svc. When I run it
manually , the finish script runs and kills the process and graceful
shutdown is happening as expected.

What may be the cause for not triggering the finish script of critical
service.


 The finish script, which is entirely optional, is not supposed to
kill the process. It is not called by s6-svc, and does not run when
the process is still alive. It is *not* equivalent to the "down"
script of a oneshot. (There is a subtle hint in the fact that the
scripts are named "run" and "finish" instead of "up" and "down".)

 If it exists, the finish script is run by s6-supervise *after* the
service dies. The point is to perform any potentially needed cleanups
that are not automatically performed at service death.

 Your current finish script is entirely incorrect and you should
delete it.

 In order to kill your service, you need to send it a signal. Daemons
normally have a signal that tells them to gracefully exit; for most
daemons, it is SIGTERM, which is what svc -d sends by default.

 If your critical.service does not die on s6-svc -d, it means that
it ignores SIGTERM. Then you need to find out *what* signal it
interprets as a request for a graceful shutdown, and put the name
of this signal in the "down-signal" file in your service definition
directory. Then, when you send s6-svc -d, critical.service will
receive the signal you indicated.

 down-signal can be SIGKILL - this is what you're using at the moment -
but if your service is that critical, chances are it's not the best way
to kill it and you should look for a gentler signal to send it first.
If there are cases where a graceful exit request is ignored or takes
too long, and you want to make sure the service dies at some point,
you can create a "timeout-kill" file, containing a number of 
milliseconds

- if the process is still alive after that duration after receiving
its down-signal, it will be sent a SIGKILL.

--
 Laurent



Re: Query on s6-log and s6-supervise

2021-06-09 Thread Laurent Bercot

I have checked the Private_Dirty memory in "smaps" of a s6-supervise
process and I don't see any consuming above 8kB. Just posting it here
for reference.


 Indeed, each mapping is small, but you have *a lot* of them. The
sum of all the Private_Dirty in your mappings, that should be shown
in smaps_rollup, is 96 kB. 24 pages! That is _huge_.

 In this list, the mappings that are really used by s6-supervise (i.e.
the incompressible amount of unshareable memory) are the following:

 - the /bin/s6-supervise section: this is static data, s6-supervise
needs a little, but it should not take more than one page.

 - the [heap] section: this is dynamically allocated memory, and for
s6-supervise it should not be bigger than 4 kB. s6-supervise does not
allocate dynamic memory itself, the presence of a heap section is due
to opendir() which needs dynamic buffers; the size of the buffer is
determined by the libc, and anything more than one page is wasteful.

( - anonymous mappings are really memory dynamically allocated for
internal  libc purposes; they do not show up in [heap] because they're
not obtained via malloc(). No function used by s6-supervise should
ever need those; any anonymous mapping you see is libc shenanigans
and counts as overhead. )

 - the [stack] section: this is difficult to control because the
amount of stack a process uses depends a lot on the compiler, the
compilation flags, etc. When built with -O2, s6-supervise should not
use more than 2-3 pages of stack. This includes a one-page buffer to
read from notification-fd; I can probably reduce the size of this
buffer and make sure the amount of needed stack pages never goes
above 2.

 So in total, the incompressible amount of private mappings is 4 to 5
pages (16 to 20 kB). All the other mappings are libc overhead.

 - the libpthread-2.31.so mapping uses 8 kB
 - the librt-2.31.so mapping uses 8 kB
 - the libc-2.31.so mapping uses 16 kB
 - the libskarnet.so mapping uses 12 kB
 - ld.so, the dynamic linker itself, uses 16 kB
 - there are 16 kB of anonymous mappings

 This is some serious waste; unfortunately, it's pretty much to be
expected from glibc, which suffers from decades of misdesign and
tunnel vision especially where dynamic linking is concerned. We are,
unfortunately, experiencing the consequences of technical debt.

 Linking against the static version of skalibs (--enable-allstatic)
should save you at least 12 kB (and probably 16) per instance of
s6-supervise. You should have noticed the improvement; your amount of
private memory should have dropped by at least 1.5MB when you switched
to --enable-allstatic.
 But I understand it is not enough.

 Unfortunately, once you have removed the libskarnet.so mappings,
it's basically down to the libc, and to achieve further improvements
I have no other suggestions than to change libcs.


If possible, can you please share us a reference smap and ps_mem data on
s6-supervise. That would really help.


 I don't use ps_mem, but here are the details of a s6-supervise process
on the skarnet.org server. s6 is linked statically against the musl
libc, which means:
 - the text segments are bigger (drawback of static linking)
 - there are fewer mappings (advantage of static linking, but even when
you're linking dynamically against musl it maps as little as it can)
 - the mappings have little libc overhead (advantage of musl)

# cat smaps_rollup

0040-7ffd53096000 ---p  00:00 0  [rollup]
Rss:  64 kB
Pss:  36 kB
Pss_Anon: 20 kB
Pss_File: 16 kB
Pss_Shmem: 0 kB
Shared_Clean: 40 kB
Shared_Dirty:  0 kB
Private_Clean: 8 kB
Private_Dirty:16 kB
Referenced:   64 kB
Anonymous:20 kB
LazyFree:  0 kB
AnonHugePages: 0 kB
ShmemPmdMapped:0 kB
FilePmdMapped: 0 kB
Shared_Hugetlb:0 kB
Private_Hugetlb:   0 kB
Swap:  0 kB
SwapPss:   0 kB
Locked:0 kB

 You can see 40kB of shared, 16kB of Private_Dirty, and 8kB of
Private_Clean - apparently there's one Private_Clean page of static
data and one of stack; I have no idea what this corresponds to in the
code, I will need to investigate and see if it can be trimmed down.

# grep -E '[[:space:]](-|r)(-|w)(-|x)(-|p)[[:space:]]|^Private_Dirty:' 
smaps


0040-00409000 r-xp  ca:00 659178  /command/s6-supervise
Private_Dirty: 0 kB
00609000-0060b000 rw-p 9000 ca:00 659178  /command/s6-supervise
Private_Dirty: 4 kB
02462000-02463000 ---p  00:00 0  [heap]
Private_Dirty: 0 kB
02463000-02464000 rw-p  00:00 0  [heap]
Private_Dirty: 4 kB
7ffd53036000-7ffd53057000 rw-p  00:00 0  [stack]
Private_Dirty: 8 kB
7ffd5309-7ffd53094000 r--p  00:00 0  [vvar]
Private_Dirty: 0 kB
7ffd53094000-7ffd53096000 r-xp  00:00 0  [vdso]
Private_Dirty: 0 kB

 One page of static data, one page 

Re: Suspend s6-log writing to log file and resume it back after sometime

2021-06-08 Thread Laurent Bercot

Any pointers on how I can go about this? Is there any hack or tricks that could 
be done in s6-log to achieve this?


 Sorry, but no, nothing comes to mind - s6-log was not designed for 
this.


 I don't think expecting services to keep running while not logging to
disk, whether or not in standby/sleep mode, is reasonable: if logs keep
coming up, memory fills up. What do you do if the machine doesn't wake
up before the memory is full? The logger will die and you will lose all
your carefully accumulated logs.

 Ideally, you would have dynamic verbosity in the service, and switch
it to zero when going into standby/sleep mode, so it would stop
producing logs, so you'd never wake up the disk. Of course, unless it's
the wakeup event listener, the concept of still having a service running
when in standby mode is weird: it defeats the very purpose of standby
mode, which is saving energy. The best way to not have your disk spin is
to have nothing to make it spin in the first place. :P

 s6-svc -p all your services when entering standby mode! (Except the
wakeup event listener.) :D

 Sorry for being unhelpful, and good luck,

--
 Laurent



Re: Query on s6-log and s6-supervise

2021-06-08 Thread Laurent Bercot

   1. Why do we need to have separate supervisors for producer and consumer
   long run services? Is it possible to have one supervisor for both producer
   and consumer, because anyhow the consumer service need not to run when the
   producer is down.  I can understand that s6 supervisor is meant to monitor
   only one service, but why not monitor a couple of services when it is
   logically valid if I am not wrong.


 Hi Arjun,

 The logic of the supervisor is already complex enough when it has
to monitor one process. It would be quadratically as complex if it
had to monitor two. In all likeliness, the first impact of such a
change would be more bugs, because the logic would be a lot more
difficult to understand and maintain.

 The amount of memory used by the s6 logic itself would not change
(or would *increase* somewhat) if the code was organized in a
different way in order to reduce the amount of processes, and you
would see an overall decrease in code quality.

 Worsening the design to offset operational costs is not a good
trade-off - it is not "logically valid", as you put it. I would not
do it even if the high amount of memory consumed by your processes
was due to s6 itself.

 But it is not the case: your operational costs are due to something
else. See below.




   2. Is it possible to have a single supervisor for a bundle of services?
   Like, one supervisor for a bundle (consisting of few services)?


 Again, there would be no engineering benefit to that. You would likely
see operational benefits, yes, but s6 is the wrong place to try and get
those benefits, because it is not the cause of your operational costs.



   3. Generally how many instances of s6-supervise can run? We are running
   into a problem where we have 129 instances of s6-supervise that leads to
   higher memory consumption. We are migrating from systemd to s6 init system
   considering the light weight, but we have a lot of s6-log and s6-supervise
   instances that results in higher memory usage compared to systemd.  Is it
   fine to have this many number of s6-supervise instances?ps_mem data -
  5.5 MiB   s6-log (46) ,  14.3 MiB   s6-supervise (129)


 It is normally totally fine to have this many number of s6-supervise
instances (and of s6-log instances), and it is the intended usage.
The skarnet.org server only has 256 MB of RAM, and currently sports 93
instances of s6-supervise (and 44 instances of s6-log) without any
trouble. It could triple that amount without breaking a sweat.

 The real problem here is that your instances appear to use so much
memory: *that* is not normal.
Every s6-supervise process should use at most 4 pages (16k) of private
dirty memory, so for 129 processes I would expect the memory usage to
be around 2.1 MB. Your reported total shows 7 times as much, which
sounds totally out of bounds to me, and even accounting for normal
operational overhead, a factor of 7 is *completely bonkers*.

 There are two possible explanations here:
 - Either ps_mem is not accurately tallying the memory used by a given
set of processes;
 - Or you are using a libc with an incredible amount of overhead, and
your libc (and in particular, I suspect, dynamic linking management in
your libc) is the culprit for the insane amount of memory that the
s6-supervise processes seem to be eating.

 The easiest way to understand what's going on is to find a
s6-supervise process's pid, and to perform
# cat /proc/$pid/smaps_rollup

 That will tell you what's going on for the chosen s6-supervise process
(they're all similar, so the number for the other s6-supervise processes
won't be far off). In particular, look at the Private_Dirty line: that
is the "real" amount of uncompressible memory used by that process.
 It should be around 16k, tops. Anything over that is overhead from
your libc.
 If the value is not too much over 16k, then ps_mem is simply lying to
you and there is nothing to worry about, except that you should use
another tool to tally memory usage.
 But if the value is much higher, then it is time to diagnose deeper:

# cat /proc/$pid/smaps

 That will show you all the mappings performed by your libc, and
the amount of memory that each of these mappings uses. Again, the
most important lines are the Private_Dirty ones - these are the
values that add up for every s6-supervise instance.

 My hunch is that you will see *a lot* of mappings, each using
4k or 8k, or even in some cases 12k, of Private_Dirty memory.
If it is the case, unfortunately there is nothing I can do about it,
because that overhead is entirely caused by your libc.

 However, there is something *you* can do about it:

 - If "ldd /bin/s6-supervise" gives you a line mentioning libs6.so
or libskarnet.so, try recompiling s6 with --enable-allstatic. This
will link against the static version of libs6 and libskarnet, which
will alleviate the costs of dynamic linking. (The price is that the
*text* of s6-supervise will be a little bigger, but it doesn't 

Re: [PATCH] s6-rc-compile: Fix setting of flag-essential

2021-06-03 Thread Laurent Bercot




Don't set the bitposition (which is 0 for 'flag-essential')
to the flags, but the bit at the position.


 Ha, nice catch. Applied, thanks!

--
 Laurent



[announce] #s6 is moving from Freenode to OFTC

2021-05-24 Thread Laurent Bercot



 Hello,
 As some of you are aware of, last week, the Freenode IRC network was
subjected to a forceful change of its operational control. The history
and details of the change are deeply political in nature and very much
off-topic for our lists, and I really do not wish this event to be
discussed here; there is plenty of information and discussion to be had
in other corners of the Internet.
 The fact remains, however, that Freenode is no longer a safe IRC
network for free and open source software communities.
 As of now, the #s6 channel (and its associated off-topic channel) has
moved to the OFTC network: https://www.oftc.net/
 The skarnet.org documentation has been updated accordingly.
 Please update your pointers, and if you're currently hanging on
Freenode, please come to OFTC instead.
 The #s6 channel on Freenode will cease to be maintained, except for
a /topic  pointing to the new hosting network. At some point, it will
be moderated, i.e. you will not be able to write in it anymore.
 In the same way, there will be an empty #s6 channel on the libera.chat
network, pointing to the OFTC network as well.
 I know that some of you would have preferred to move to libera.chat,
the "official" successor of freenode, instead of OFTC. It was not an
easy choice to make, and no matter the decision, some people would
have ended up unhappy anyway. Please fight the undoubtedly overwhelming
urge to explain why the decision was wrong and the other one would have
been better; instead, feel free to address all complaints to Andrew Lee,
Crown Prince of Korea and originator of all this shitshow.
 Thank you. I hope to meet you on the other side.
--
 Laurent



[announce] New hypermail archives of skarnet.org mailing lists

2021-05-09 Thread Laurent Bercot



 Hello,

 ezmlm-cgi, the web interface to the archives of the skarnet.org
mailing-lists, has been broken for... forever, resulting in an
inability to display certain messages. I tried debugging it, but
could not find out what was happening within a reasonable amount
of time dedicated to it.

 A new web interface to these archives is now available, and this
one appears to work better (there are still a few quirks with
utf-8 display but we'll sort them out).

 https://skarnet.org/lists/skaware/
 https://skarnet.org/lists/supervision/

 If you are using web links to ML archived messages, please update
your links to point to the hypermail archive instead of the ezmlm
one. At some point in the (distant) future, the ezmlm archive will be
deprecated. Thanks.

 Enjoy.

--
 Laurent



Re: S6 logging

2021-03-09 Thread Laurent Bercot

I am trying to log the prints coming from daemons like dropbear using
s6-log, but couldn't make it.
Am I missing something?


 You really need to improve on your way of asking for help.

"couldn't make it" is not an actionable report. You need to say:
- exactly what you did (you did that, good, but you did not mention
that you were using s6-rc, which may or may not be relevant)
- exactly what you expected (you didn't do that, but it is easy to
figure out)
- exactly what happened (you didn't do that at all and we're left
to guess).

 Going out on a limb, and reading the manual page (I haven't used
dropbear in a little while), I'm guessing that the problem is this:



/usr/sbin/dropbear -F


 You also need the -E option in order to tell dropbear to send its
error messages to stderr instead of syslog.

 But if it still doesn't work with -F -E, then we'll need more precise
information.

--
 Laurent



Re: Path monitoring support in s6-services

2021-02-17 Thread Laurent Bercot

Since it requires individual instances of inotifyd for each service(s)
[which depends on  multiple files/paths modifications) to get started]


 Have you tried it and noticed an impact on your boot times?
(AKA: "Profile, don't speculate.")

--
 Laurent



Re: Path monitoring support in s6-services

2021-02-17 Thread Laurent Bercot

 inotifyd (or something similar) + s6-svc (or s6-rc)?

Thought of the same but I have many such services;Just thinking of cpu
overhead during the initial boot up.


 What makes you think this would have a noticeable impact on your CPU
load?

--
 Laurent



Re: Some suggestions on old-fashioned usage with s6 2.10.x

2021-02-15 Thread Laurent Bercot

(Apologies for the broken threading, I originally sent my answer with
the incorrect From: and it was rightfully rejected.)



Re: Some suggestions on old-fashioned usage with s6 2.10.x

2021-02-15 Thread Laurent Bercot




I do not really understand their excuse here.  CLI incompatibility is
trivially solvable by creating links (or so) for `halt' / `poweroff' /
`reboot', and even the `shutdown' command can be a wrapper for an `atd'
based mechanism.


 The options! The options need to be all compatible. :) And for
"shutdown", they would never implement a wrapper themselves, I would
have to do it for them - which is exactly what I did, although it's
a C program that actually implements shutdown, not a wrapper around an
atd program I can't assume will be present on the system.

 I'm not defending distros here, but it *is* true that a drop-in
replacement, in general, is a lot easier to deal with than a drop-in-
most-of-the-time-maybe-but-not-with-that-option replacement. Anyone
who has tried to replace GNU coreutils with busybox can relate.



  In case they complain about the implementation of the
CLI, the actual interface to `shutdownd' is not that similar to the
`telinit' interface (at least to the one I think it is) either.


 Which is why s6-l-i also comes with a runleveld service, for people
who need the telinit interface. shutdownd is only for the actual
stages 3 and 4, not service management (which telinit is a now obsolete
forerunner of).



If I understand it correctly, letting `s6-svscan' exec() stage 3 also
achieves immunity to `kill -KILL -1'.  I also find this "old-fashioned"
approach conceptually and implementationally simpler than an army of
`s6-supervise' restarting only to be killed again


 What army? By the time the final kill happens, the service manager
has brought everything down, and shutdownd has cleaned up the scandir,
only leaving it with what *should* be restarted. You seem to think
I haven't given these basic things the two minutes of attention they
deserve.

 Conceptually, the "old-fashioned" approach may be simpler, yes.
Implementationally, I disagree that it is, and I'll give you a very
simple example to illustrate it, but it's not the only thing that
implementations must pay attention to, there are a few other quirks
that I've stumbled upon and that disappear when s6-svscan remains
pid 1 until the very end.

 You're going to kill every process. The zombies need to be reapt,
else you won't be able to unmount the filesystems. So your pid 1
needs to be able to wait for children it doesn't know it has
(foreground does not) and guarantee that it doesn't try unmounting
the filesystems before having reapt everything (a shell does not give
ordering guarantees when it gets a SIGCHLD, even though it works in
practice). So for this specific use I had to add a special case to
execline's wait command, "wait { }", that waits on *everything*, and
also make sure that wait doesn't die because it's going to run as pid 1,
even very briefly.
 And after that, you need to make sure to unmount filesystems
immediately, because if you spawn other processes, you would first have
to wait on them as well.

 For every process that may run as pid 1, you need extra special care.
Using an interpreter program as pid 1 means your interpreter needs to
have been designed for it. Using execline means every execline binary
that may run as pid 1 needs to be vetted for it. If your shutdown
sequence is e.g. written in Lisp, and your Lisp interpreter handles
pid 1 duties correctly, okay, that's fair, but that's *two* programs
that need to do it, when one would be enough.
 s6-svscan has already been designed for that and provides all the
guarantees you need. When s6-svscan is running as pid 1, it takes away
a lot of mental burden off the shutdown sequence.



 and a `shutdownd'
restarting to execute the halting procedure (see some kind of "state"
here?  Functional programmers do not hate it for nothing).


 Yes, there is one bit of state involved. I think our feeble human minds,
and a fortiori computers, can handle one bit of state.



  I know this
seems less recoverable than the `shutdownd' approach, but does that
count as a reason strong enough to warrant the latter approach, if the
halting procedure has already been distilled to its bare essentials
and is virtually immune to all non-fatal problems (that is, excluding
something as severe as the absence of a `reboot -f' implementation)?


 My point is that making the halting procedure virtually immune to all
non-fatal problems is *more difficult* when you tear down the
supervision tree early. I am more confident in the shutdownd approach,
because it is less fragile, more forgiving. If there's a bug in it, it
will be easy to fix.

 I understand that the barebones approach is intellectually more
satisfying - it's more minimalistic, more symmetrical, etc. But shutting
down a machine is *not* symmetrical to booting it. When you boot, you
start with nothing and need a precise sequence of instructions in order
to build up to a functional system. When you shutdown, you have a fully
functional system already, that has proven to be working, and you just
need to clean up and make sure you 

Re: [s6-svperms] Handling service permissions at creation time.

2021-02-15 Thread Laurent Bercot

Services can fix their own permissions so if s6-rc is going to grow that
functionality it should be in the generated run, not in some rarely used
outboard helper service.


 As answered on IRC, for ML completeness: no, because permissions should
be fixed when the supervisor starts, not when the service starts. So a
oneshot that runs right after the supervisors are started is the
correct solution.

--
 Laurent



Re: [s6-svperms] Handling service permissions at creation time.

2021-02-15 Thread Laurent Bercot

The s6-svperms is a great feature but it only handle permissions control of a 
service at runtime. That means that we need to change the permissions of the 
service everytime that a reboot occurs.
For a server, this is not really a big deal but for a desktop machine this can 
be really hard to handle as far as the runtime services can be different at 
each boot (user can activate or disactivate service for his purpose).


 Right. The problem here is that the files holding the permissions all
exist in a tmpfs (typically they're all under /run), and are recreated
at every boot, with the default attributes.

 If you run a supervision tree on a non-tmpfs, then the attributes will
be stored on disk, and kept from one boot to the next.

 For a s6-linux-init + s6 + s6-rc installation, the service directories
are always on a tmpfs, so yes, the problem will manifest.



Obviously, a script launched at some point of the boot (or after) can change 
the permissions on the necessary services. However, i think this is not easier 
and not flexible.


 I disagree, I think it's the right way to address it; see below.



S6-supervise create the control, status and event directory with the uid:gid of 
the owner of the process (correct me if i'm wrong).


 That's correct - and the owner of the s6-supervise process is the owner
of the whole supervision tree.



So, If we have a e.g /data/perms/rules/uid//allow file and if 
s6-supervise check this directory at the creation time and create the necessary 
file/directory with the respective uid/gid found at that directory, we can configure a 
service permissions permanently.


 The problem with this approach is the following:

 - The whole service directory is stored in RAM, so you cannot store
svperms attributes anywhere under the service directory - else you'll
have the exact same problem as you do now, the attributes will not
survive a reboot. :)

 - s6-supervise does not and will not look at anything outside of a
service directory. The service directory is the place for everything
related to s6-supervise. (And even then, s6-supervise stays out of
data/ and env/.) So if you need configuration that cannot be stored
in a service directory because the service directory is all in tmpfs,
s6-supervise is not the program that can handle that configuration.

 So, the best way to apply attributes to a set of service directories
is to have another process do it once the service directories have
been copied, because only an external process will be able to access
information that is stored on disk.

 Typically, if you're using s6-rc, this can be done via a s6-rc
service running early, before the longruns are started. The "up"
script can read attributes from a file and set them; the "down"
script can save all the attributes to a file.

 Ideally, though, the user would be able to declare the attributes
in service definition directories, and s6-rc would set them
automatically at start. That wouldn't help with early services, but
early services should be few and far between and their permissions
shouldn't be trifled with.

 I can add that functionality to the next version of s6-rc. What do
you think?

--
 Laurent



Re: timer support for s6 services

2021-02-10 Thread Laurent Bercot

Is there a way to set a timer option on a particular service 'X', so that
'X' gets restarted for every timer seconds ?


 You can achieve that with another service that just sleeps for
'timer' seconds then sends a s6-svc -r command to the service you want
restarted.

--
 Laurent



Re: stage2 as a service

2021-02-01 Thread Laurent Bercot

For the normal case you are absolutly right. But with stage 2 as a service
you have a race condition between stage 2 and s6-svscan-log. The usual
trick for stage 2 solves this problem.


 Ah, now I get it: stage 2 must not start before the catch-all logger
is ready, so you open the fifo for writing with the intent to block
until the reader has started. Yes, that makes sense and is needed.

 However, I will list it as another drawback of your approach :P
The internal sauce is visible in your stage 2 script. Ideally, stage 2
would only run once everything is already in place and wouldn't have
to bother with any mechanism details. (s6-linux-init achieves this.)



Well, running once is a part of supervise from the start on, by djb. It's
invented for oneshots.


 I disagree. svc -o was made for short-lived processes that you may want
to run again. Or for testing a crashing daemon without having to deal
with automatic restarts and failure loops. Anything that you
potentially run more than once. It's "run once", but still in the
context of supervision; you would only use it on services that have
at least the potential to make use of supervision.

 For things that you will only ever run once, why even supervise them in
the first place? Supervision is a wonderful tool, but like any tool, it
should only be used when appropriate, and I don't think the one-time
initialization script is the place for it.



Well, s6-rc is using ./down, too. The shutdown is a very special case for
supervision.


 There is an important difference.
 s6-rc is using ./down files for services that it wants down,
independently from the machine's lifetime. Typically it's using them
at boot time, in order to have the supervisors start, but not the
services themselves (they'll be brought up later on according to the
dependency graph). s6-rc *wants* the services to be supervised, even
if it's not starting them at the same time as the supervisors.

 You're only using a ./down file at shutdown time because you have a
service that you know must not restart when the supervisor is killed,
and will never restart, and has not been restarted for the entire
machine lifetime. The presence of the supervisor here is not a feature,
it brings you no value, on the contrary - it's only making your life
more difficult.



Stage 2 as a service allows us to restart it, if - accidentally - it is
necessary. Obviously, that should be really seldom the case.


 Honestly, I can't think of a single case where you'd need to restart
the initialization sequence of your machine. Anything you'd want to
restart once the system has booted should be handled by the service
manager.



I'm still migrating from systemd to s6{,-rc} with /fs/* step by step.
Therfore, I need more flexibility than s6-linux-init.


 The migration from systemd's service manager (to s6-rc or anything
else) is totally independent from the init system change. You can
make a systemd oneshot that launches s6-rc-init then s6-rc, and
convert all your systemd services one by one to s6-rc services; then,
once you don't depend on systemd for anything else than the early
boot and the s6-rc service, you can switch inits, and then you should
be able to use s6-linux-init.

 I generally recommend doing the opposite: switching to s6-linux-init
first then converting services to s6-rc, because the latter is a lot
more work: for instance Adélie uses s6-linux-init but still has
OpenRC as its service manager, because I haven't done the conversion
work yet. However, it's different with systemd, because systemd cannot
be run as not-pid-1 - its service manager cannot be separated from its
early boot functionality. So you have to keep it as init until it's
not needed for anything else. They call it modular software ;)

--
 Laurent



Re: stage2 as a service [was: Some suggestions on old-fashioned usage with s6 2.10.x]

2021-01-31 Thread Laurent Bercot


 Hi Stefan,
 Long time no see!

 A few comments:



# optional:  -- Question: Is this necessary?
  redirfd -w 0 ${SCANDIR}/service/s6-svscan-log/fifo
  # now the catch all logger runs
  fdclose 0


 I'm not sure what you're trying to do here. The catch-all logger
should be automatically unblocked when
${SCANDIR}/service/s6-svscan-log/run starts.
 The fifo trick should not be visible at all in stage 2: by the time
stage 2 is running, everything is clean and no trickery should take
place. The point of the fifo trick is to make the supervision tree
log to a service that is part of the same supervision tree; but once
the tree has started, no sleight of hand is required.



foreground { s6-svc -O . } # don't restart me


 If you have to do this, it is the first sign that you're abusing
the supervision pattern; see below.



foreground { s6-rc -l ${LIVEDIR}/live -t 1 change ${RCDEFAULT} }
# notify s6-supervise:
fdmove 1 3
foreground { echo "s6-rc ready, stage 2 is up." }
fdclose 1  # -- Question: Is this necessary?


 It's not strictly necessary to close the fd after notifying readiness,
but it's a good idea nonetheless since the fd is unusable afterwards.
However, readiness notification is only useful when your service is
actually providing a... service once it's ready; here, your "service"
dies immediately, and is not restarted.
 That's because it's really a oneshot that you're treating as a
longrun, which is abusing the pattern.



# NB: shutdown should create ./down here, to avoid race conditions


 And here is the final proof: in order to make your architecture work,
you have to *fight* supervision features, because they are getting in
your way instead of helping you.
 This shows that it's really not a good idea to run stage 2 as a
supervised service. Stage 2 is really a one-time initialization script
that should be run after the supervision tree is started, but *not*
supervised.



  { # fallback login
sulogin --force -t 600 # timeout 600 seconds, i.e. 10 minutes.
# kernel panic
  }


 Your need for sulogin here comes from the fact that you're doing quite
complex operations in stage 1: a user-defined set of hooks, then
several filesystem mounts, then another user-defined set of hooks.
And even then, you're running those in foreground blocks, so you're
not catching the errors; the only time your fallback activates is if
the cp -a from ${REPO} fails. Was that intended?

 In any case, that's a lot of error-prone work that could be done in
stage 2 instead. If you keep stage 1 as barebones as possible (and
only mount one single writable filesystem for the service directories)
you should be able to do away with sulogin entirely. sulogin is a
horrible hack that was only written because sysvinit is complex enough
that it needs a special debugging tool if something breaks in the
middle.
 With an s6-based init, it's not the case. Ideally, any failure that
happens before your early getty is running can only be serious enough
that you have to init=/bin/sh anyway. And for everything else, you have
your early getty. No need for special tools.



Also I may switch to s6-linux-init finally.


 It should definitely spare you a lot of work. That's what it's for :)

--
 Laurent


Re: Some suggestions on old-fashioned usage with s6 2.10.x

2021-01-29 Thread Laurent Bercot

But even `s6-reboot' from older s6-linux-init, or `busybox reboot'
with slew can already do that...


 Yes. And as your sharp mind undoubtedly noticed, those commands are
not the same as "reboot".

 Which means burden on users.

 Yes, I also thought it was a small burden at first, but it's not.
It means that all sysvinit-compatible automation does not work, so
there is some porting work to do. And the gap between "a little work"
and "zero work" is HUGE. It's much bigger than the gap between
"a little work" and "a lot of work".

 Bear in mind that my eventual goal for s6 is distro adoption. And
distro maintainers will find any and every excuse to reject it.
Having a "shutdown" command that works exactly like sysvinit's
shutdown is essential, because it deals with a major objection, which
is incompatibility and user-unfriendliness.



There is some non-trivial trade-off: in short, the existence of the
supervision tree after stage 2 is by itself a kind of "special case"
(eg. search for "careful handling" in [1]).


 I feel like you misinterpreted my meaning.
 The *absence* of a supervision tree after stage 2 is precisely what
requires careful handling, and runit only works because Linux has
that peculiarity that kill -9 -1 does not kill the emitter!
 Having a supervision tree in stage 3 actually *helps* with the
late shutdown procedure: shutdownd dies right after the kill (which
would make it usable even on a system without the Linux specialcase)
and is restarted by the supervisor for stage 4.



  I am also thinking about
an application scenario, where a supervision tree with a new s6 version
replaces the active tree with an old version.  This is somewhat silly:
it can be a little useful in case of major version bump, but is probably
better solved by complete reboot to completely get rid of all old things
(s6 or not, updated together) in the memory.


 Yes, upgrading your init without rebooting is generally not worth
it. Note that s6-svscan could still be configured to do that with
clever use of SIG scripts; but restarting the s6-supervise processes
is a pain to do without restarting your whole supervision tree, so
it's probably better to just reboot.
 This is the case with every single init out there, so you can't paint
that as a drawback of s6. You can wish it were easier, and I agree
that it would be nice, but the necessary trade-offs to make rebootless
init upgrades viable are very much not worth it.



 all-in-all has just less of a "screwdriver and duct tape" feel than
 a bunch of execline (or rc ;)) scripts.

I am very sorry, but I do feel a strong smell of systemd mindset here :(


 A systemd mindset in an attempt to be a drop-in replacement for
sysvinit. Yeah, right.

 More seriously, you're being unfair, because you're not locked in
at all. You can use the new s6-linux-init and *still* do everything
you were doing before:
 - you can manually edit your run-image
 - you can remove the runleveld service (which is only used for
telinit emulation) and even the shutdownd service
 - you can write SIG scripts to do shutdowns the preferred way
 - I absolutely recommend against doing this, but you *still* have
a place in stage 1 where you can fiddle with things: in the
init script before the call to the s6-linux-init binary.

 So basically, all you're complaining about is that s6-linux-init-maker
is not generating your preferred run-image layout out-of-the-box
anymore. Well, you're an advanced user, you know what you are doing;
the knobs and levers are *still all there*. The only binary that
kinda hardcodes things is s6-linux-init itself, and if you give it a
try, I'm pretty sure you'll like it, because there was never any reason
to modify the core of stage 1 in the first place and what it does is
what any kind of stage 1 needs to do, no matter what language it's
written in.
 And if you don't like it, you're still free to ditch the s6-linux-init
package entirely and keep using your own stage 1.

 Besides, when systemd advocates paint sysv-rc shell scripts as
"duct tape", they're *right*. sysv-rc (and OpenRC) scripts are loaded
with boilerplate that only exists to compensate for the lack of a
supervision infrastructure, and systemd, like any supervision system,
does away with that. systemd has 99 problems, but rightly calling out
oversized script scaffoldings ain't one. Its disingenuousness lies in
pretending that an overengineered, opaque, all-encompassing, unescapable
framework is better than the duct tape; and I think you'll find that
s6-linux-init isn't quite the monster you seem to believe it is.

--
 Laurent



Re: Some suggestions on old-fashioned usage with s6 2.10.x

2021-01-29 Thread Laurent Bercot

Currently I do not understand the `s6-linux-init-shutdown(d)' way
well, so the old-fashioned way is retained at least for now, given its
simplicity in implementation and seemingly better flexibility.  Frankly
it is my intuition that the new way costs more than the old way, but
does not provide that much in return.  (Feel free to prove me wrong.)


 It may cost more *to you*, but there is real and significant value
in following existing interfaces that people are familiar with. Being
able to just use "reboot" instead of the, uh, slightly less intuitive
"s6-svscanctl -6 /run/service" to reboot your machine, is one fewer
obstacle on the way to mainstream s6 adoption.

 Additionally, and maybe more to your liking, there are also technical
benefits to never killing s6-svscan. Being able to assume that a
supervision tree will be operational at all times, including during
shutdown (and even in stage 4!), is really comfortable, it cuts down
on a lot of specialcasing, it makes shutdown procedures recoverable,
integration into various configurations easier (I'm thinking
containers with or without a catch-all logger, for instance), and
all-in-all has just less of a "screwdriver and duct tape" feel than
a bunch of execline (or rc ;)) scripts.

--
 Laurent



Re: Some suggestions on old-fashioned usage with s6 2.10.x

2021-01-29 Thread Laurent Bercot

Actually I do visit the CGit web interface fairly often


 Oh, my bad, the links in the skaware documents actually point to
https://git.skarnet.org/something. Fair enough then, I have made
git.skarnet.org an explicit alias to skarnet.org.



 Perhaps I need to batch change all
 references in the UP2020 document to
...


 No need - I'll own that one, and keep the alias explicitly working.
It's not like subdomains are a scarce resource.

--
 Laurent



Re: Some suggestions on old-fashioned usage with s6 2.10.x

2021-01-28 Thread Laurent Bercot

BTW,  seems to be returning empty HTTP replies
now; both  and  work as
expected though.


 That is a side effect of a recent s6-networking addition, where
s6-tlsd passes the SNI server name to the application via an
environment variable. Which allows me to serve virtual hosts even with
a HTTP/1.0 server, but only under TLS. Fun experiment. :)

 I may change it back, but I don't think the current state is broken,
because you're not supposed to access git.skarnet.org via HTTP(S)! :P

--
 Laurent



Re: Some suggestions on old-fashioned usage with s6 2.10.x

2021-01-28 Thread Laurent Bercot




I did not actively follow the recent evolution of s6, and have just been
bitten badly by s6 2.10.x on my Alpine servers (where slew [1] is used
of course) when it comes along with other updates.


 Sorry. This bears repeating: major version upgrades may break things.

 Compatibility is a good thing, that's why I try to keep major version
changes few and far between; but the other side of the coin is that
when I'm doing one, I want to make use of it and cram all the
incompatible changes that may be needed in the foreseeable future.
 So, you have to pay attention less often, but when it happens, you do
have to pay attention. Previous major version changes may have gone
smoothly - I try to keep it as smooth as possible when there's no need
to break UX - but it's no guarantee that it will always be smooth
sailing. This time, there were very visible user changes; sorry for
the inconvenience, but I reserve the right to do this, and I try to
document the breaking changes in the release notes.

 It is, admittedly, a drawback of distributions that they make major
version upgrades very silent - so, if you have local software that
relies on an old API, and the distro updates it under your feet,
you're caught unaware. I don't have a satisfying solution to this;
maybe I should have added a post-upgrade file printing red blinking
bold text, but that doesn't address automated or afk updates.



better if we kept the option supported for a transition period, and that
only removed it from the manual pages while urging users to get rid of
it.  After all, in this case, silently ignoring `-s' is behaviourly
similar to (if not perfectly compatible with) old `s6-svscan'.


 It's always a delicate balance, because "better" is not 
one-dimensional.

It would be better UX, yes, definitely. But also legacy code to maintain
until the next major update (which can take a while), and I tend to
assign a *very* high cost to legacy code in s6-svscan and s6-supervise,
for obvious reasons. And in my experience, few people (and you,
Casper, certainly belong to them!) actually bother changing their
scripts as long as they keep working - most only spring into action when
something breaks. A compromise I've found relatively efficient was to
add nagging warnings on deprecated option use, but 1. that's even more
code that will be removed, and 2. I hate nagware, with a passion, in
all its forms.
 There is no really good solution, and I prefer a short, sharp pain
(when things break) followed by relief (when they're fixed) to a long
dull ache (maintaining compat code). Especially when I'm not the one
experiencing the sharp pain ;)



Second, `s6-svscan' now waits for its `s6-supervise' children to exit
before exec()ing `.s6-svscan/finish'


 You seem to have found the proper way of managing this with SIG files,
but just in case: "s6-svscanctl -tb" will net you the old behaviour.

--
 Laurent



[announce] skalibs-2.10.0.1, execline-2.7.0.1, s6-2.10.0.1

2021-01-25 Thread Laurent Bercot



 Hello,

 New skarnet.org packages are available:

 skalibs-2.10.0.1
 execline-2.7.0.1
 s6-2.10.0.1

 Those are bugfix releases.
 I normally don't announce bugfix releases, but the bugs that have
been fixed here are pretty visible (sorry about that!), so all users
are encouraged to upgrade ASAP.

 https://skarnet.org/software/skalibs/
 https://skarnet.org/software/execline/
 https://skarnet.org/software/s6/

 Enjoy,
 More bug-reports always welcome.

--
 Laurent



Re: s6-man-pages updated to 2.10.0.0

2021-01-22 Thread Laurent Bercot

Very nice. Not to nitpick though, the standard way of handling
out-of-version versioning is with an underscore, not an additional dot.
So your releases would be v2.10.0.0_1 (and so on).


 The additional dot was on my suggestion, to avoid conflicting with
distros wanting to package s6-man-pages. Distros often use underscores
in their package naming schemes, and distro-version isn't always the
exact same as upstream-version.

--
 Laurent



Re: The multisubstitute commands in the run script generated by s6-usertree-maker are in a wrong order

2021-01-22 Thread Laurent Bercot

As shown above, the multisubstitute command that contains XDG_RUNTIME_DIR is 
put after the one that contains USER, HOME, UID, GID, and GIDLIST. If for 
example XDG_RUNTIME_DIR=/run/user/$UID, the $UID here will not be substituted 
with the user's UID since by the time $UID is substituted, $XDG_RUNTIME_DIR 
hasn't been substituted yet. So perhaps the order of these two multisubstitute 
should be inverted.


 You're right, of course. I remember testing it, and it *working*, so
I did not think any further, but in retrospect it appears my test was
incorrect.
 Thanks for the report! Fixed in git head.

 I'll give it the week-end, in case more bug-reports come in, then
I'll release 2.10.0.1.

 Note that skalibs-2.10.0.1 is out already, it fixes a bug that
manifests in execline's 'emptyenv -c' command.

--
 Laurent



  1   2   3   4   5   >