I've observed this error under similar circumstances: launching a guix
system container script with network sharing enabled, on a foreign disto
(Debian 10) with nscd running.

Using `strace -f /gnu/store/...-run-container`, we can observe the
container's lookup of user accounts via the foreign distro's nscd socket:

[pid 16582] socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 11
[pid 16582] connect(11, {sa_family=AF_UNIX,
sun_path="/var/run/nscd/socket"}, 110) = 0
[pid 16582] sendto(11, "\2\0\0\0\0\0\0\0\t\0\0\0postgres\0", 21,
MSG_NOSIGNAL, NULL, 0) = 21
[pid 16582] poll([{fd=11, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1
([{fd=11, revents=POLLIN}])
[pid 16582] read(11,
"\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\0"...,
36) = 36
[pid 16582] close(11)                   = 0

Since the user ("postgres") is indeed missing in the foreign disto, the
lookup fails. In this case, disabling nscd on the foreign distro allowed
the container script to run without error.

Based on comments in https://issues.guix.info/issue/28128, I see that it
was a deliberate choice to bind-mount the foreign distro's nscd socket
inside the container (instead of starting a separate containerized nscd
instance). But I'm having trouble seeing why it's acceptable to leak state
from the foreign distro's user space into the container. Is there something
I'm missing?

Cheers,

Jason

Reply via email to