Hello,

I stumbled on what might be an odd bug.

Most of our production workloads are containerised. We use Docker CE for builds and execution on Linux. Outside of my day job, I have grown quite fond of s6, and so it felt natural to make use of it in these containerised environments.

(Please skip the next two paragraphs if you are familiar with Docker.)

Many of our staff, myself included, happen to work on Apple machines. Docker (the company) distribute 'Docker Desktop' for the Mac platform. Essentially, this tool works by shipping a tiny Linux virtual machine and a whole lot of magic that plumbs the VM into the host operating system. The magic works surprisingly well.

Over the past couple of years, Apple have pivoted their computing products away from amd64/x86-64 to their own arm64-based CPUs. This upstream change would have posed some inconvenience to Docker's existing Mac userbase, and so they added automatic amd64 emulation on arm64 host machines -- courtesy of the QEMU user space emulator. The end result provides user-friendly (easy!) execution of linux/amd64 executables on a darwin/arm64 host. Again, the magic works surprisingly well. Performance aside, I have not been stumped by any observable difference in behaviour between these platforms until today... in s6.

Included below is the source for a simple container. PID 1 in the container is s6-svscan. s6-svscan forks one s6-supervise process, which in turn forks a simple sleep(1). The experiment uses s6 as bundled with Debian bullseye; deb version 2.10.0.2-1.

https://packages.debian.org/bullseye/s6
https://salsa.debian.org/zhsj/s6/-/tree/debian/2.10.0.2-1

Interesting things happen when we signal s6-svscan with SIGTERM:

docker kill -s TERM <container-name>

The supervision tree and s6-svscan is terminated when executed without any QEMU emulation. The container then ceases execution as one would expect. This works fine with dockerd executing natively on an linux/amd64 host machine, and also when Docker Desktop is executing native arm64 images on an Apple M1 Pro.

However, if we repeat the experiment with an amd64 image on an arm64 Mac (i.e.: with qemu) something completely different happens. None of the three processes are terminated. s6-svscan continues running. The container never terminates.

Note that svc/.s6-svscan/SIGTERM is not marked executable. This was intentional. After reading src/supervision/s6-svscan.c, I thought this might provide a hint as to what is happening.

Output from the happy case (without qemu) is as expected:

% docker run --rm ... > *signalled*
s6-svscan: warning: unable to spawn .s6-svscan/SIGTERM: Permission denied
*terminates*

Output from the sad case (with qemu) is quite curious:

% docker run --rm ...
WARNING: The requested image's platform (linux/amd64) does not match the 
detected host platform (linux/arm64/v8) and no specific platform was requested
*signalled*

Notably, there is no stderr write from strerr_warnwu2sys() to say the call to posix_spawn(3) failed.

ptrace is not available in qemu, though we do have qemu -strace, which sort of emulates the output from Linux's strace(1). Things get even more curious here.

For this particular test, I forked s6-svscan from tini. tini was PID 1. This made the initial signal delivery more obvious in the trace. Problematic s6-svscan behaviour otherwise appeared to be the same as the test without tini.

PID 1  is tini.
PID 11 is s6-svscan.
PID 25 appears to be the process forked for .s6-svscan/SIGTERM.

1 wait4(4294967295,274903107948,1,0,0,274905173824) = 0
1 rt_sigtimedwait(274903108064,274903107856,274877935648,8,0,274905173824) = -1 
errno=11 (Resource temporarily unavailable)
1 wait4(4294967295,274903107948,1,0,0,274905173824) = 0
1 rt_sigtimedwait(274903108064,274903107856,274877935648,8,0,274905173824) = -1 
errno=11 (Resource temporarily unavailable)
1 wait4(4294967295,274903107948,1,0,0,274905173824) = 0
1 rt_sigtimedwait(274903108064,274903107856,274877935648,8,0,274905173824) = 15
1 kill(11,SIGTERM) = 0
1 wait4(4294967295,274903107948,1,0,0,274905173824) = 0
11 ppoll(274903108336,2,274903106832,0,8,1) = 1
11 clock_gettime(CLOCK_MONOTONIC,0x0000004001808560) = 0 ({tv_sec = 
356286,tv_nsec = 304382892})
11 read(6,0x1808550,128) = 128
11 mmap(NULL,36864,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS|0x20000,-1,0) 
= 0x0000004001a89000
11 rt_sigprocmask(SIG_BLOCK,0x0000004001a09f40,0x0000004001808100) = 0
11 
clone(CLONE_VM|CLONE_VFORK|0x11,child_stack=0x0000004001a91ff0,parent_tidptr=0x0000004001a74740,tls=0x0000000000000000,child_tidptr=0x00000040018085d0)
 = 25
11 munmap(0x0000004001a89000,36864) = 0
11 rt_sigprocmask(SIG_SETMASK,0x0000004001808100,NULL) = 0
11 read(6,0x1808550,128) = -1 errno=11 (Resource temporarily unavailable)
11 
clone(CLONE_VM|CLONE_VFORK|0x11,child_stack=0x0000004001a91ff0,parent_tidptr=0x0000004001a74740,tls=0x0000000000000000,child_tidptr=0x00000040018085d0)
 = 0
25 rt_sigprocmask(SIG_BLOCK,NULL,0x0000004001a91ea0) = 0
25 rt_sigaction(SIGHUP,NULL,0x0000004001a91dd0) = 0
25 rt_sigaction(SIGHUP,0x0000004001a91d30,NULL) = 0
...
25 rt_sigaction(61,NULL,0x0000004001a91dd0) = 0
25 rt_sigaction(61,0x0000004001a91d30,NULL) = 0
25 rt_sigprocmask(SIG_SETMASK,0x00000040018084b8,NULL) = 0
25 execve(".s6-svscan/SIGTERM",{".s6-svscan/SIGTERM",NULL}) = -1 errno=13 
(Permission denied)
25 exit_group(127)
11 ppoll(274903108336,2,274903106832,0,8,1) = 1
11 clock_gettime(CLOCK_MONOTONIC,0x0000004001808560) = 0 ({tv_sec = 
356286,tv_nsec = 308767933})
11 read(6,0x1808550,128) = 128
11 read(6,0x1808550,128) = -1 errno=11 (Resource temporarily unavailable)
11 wait4(4294967295,274903106972,1,0,8,1) = 25
11 wait4(4294967295,274903106972,1,0,8,1) = 0
1 rt_sigtimedwait(274903108064,274903107856,274877935648,8,0,274905173824) = -1 
errno=11 (Resource temporarily unavailable)
1 wait4(4294967295,274903107948,1,0,0,274905173824) = 0

(Many intervening calls to rt_sigaction() redacted for brevity.)

The call to posix_spawn(3) appears evident there, but there is no evidence of any follow-up to the failure. s6-svscan should call term() and die soon afterwards but that never happens.

Repeating the test with no .s6-svscan/SIGTERM file produced a similar result (albeit with errno=2 on exec).

At this stage, I am unsure where the problem lies. It may not be in s6. Aside from this quirk, s6 does seem to function just fine under qemu. Non-trivial supervision trees come up without a hitch. I first noticed this problem while working on a container with supervised HAProxy and Go processes -- all non-trivial programs. No problems were observed in these other programs under emulation.

I suppose the next step would be to avail myself of some real debugging tools -- or at least put myself in a position where I can build an s6 with some simple printf debug statements. In the meantime, I thought I'd reach out in case I have missed something obvious.

Test input:

% tar --format=mtree --options 'mtree:!all,mtree:flags,mode,size,type' -c .
#mtree
. mode=700 type=dir
./Dockerfile mode=644 type=file size=389
./svc mode=755 type=dir
./svc/.s6-svscan mode=755 type=dir
./svc/.s6-svscan/SIGTERM mode=644 type=file size=0
./svc/sleep mode=755 type=dir
./svc/sleep/run mode=755 type=file size=43

% find -s . \! -type d -print0 | xargs -0 grep .
./Dockerfile:FROM --platform=linux/amd64 debian:bullseye-slim
./Dockerfile:RUN DEBIAN_FRONTEND=noninteractive apt-get update \
./Dockerfile:  && DEBIAN_FRONTEND=noninteractive apt-get -y dist-upgrade \
./Dockerfile:  && DEBIAN_FRONTEND=noninteractive apt-get -y install 
--no-install-recommends \
./Dockerfile:    procps \
./Dockerfile:    psmisc \
./Dockerfile:    s6 \
./Dockerfile:  && DEBIAN_FRONTEND=noninteractive apt-get clean
./Dockerfile:COPY svc/ svc/
./Dockerfile:ENTRYPOINT ["/usr/bin/s6-svscan", "/svc"]
./svc/sleep/run:#!/bin/sh
./svc/sleep/run:# GNU sleep
./svc/sleep/run:exec sleep infinity

Cheers,

--
Saj Goonatilleke

Reply via email to