On 2025-09-30 09:28, Óscar García Amor wrote:
It is the last line
of `build()`, `./build/telegraf config > telegraf.conf`. It is at this
point that the execution stops completely if I build the package with
`extra-x86_64-build`.

Can you think of any reason why this might happen or how it could be
fixed?

I've reproduced the issue, and boiled it down to a likely interaction
of systemd-nspawn and go's threading/process handling. systemd-nspawn
is used by extra-x86_64-build implicitly to containerize the build.

The call of `telegraf config` does output, but hangs on termination
in an epoll_pwait/futex wait loop as described
in [issue 55120](https://github.com/golang/go/issues/55120).

I boiled reproduction down to running the build once until it hangs,
Ctrl-C to abort it, then running the `telegraf config` manually in a
minimal systemd-nspawn container. Always hangs.

I installed strace in the container root as well to see where exactly
the process hangs, which is how I found the unresolved go issue 55120:

```
sudo systemd-nspawn -D /var/lib/archbuild/extra-x86_64/gyroplast /build/telegraf/src/telegraf-1.36.2/build/telegraf config sudo systemd-nspawn -D /var/lib/archbuild/extra-x86_64/gyroplast strace -x -y -v -ff /build/telegraf/src/telegraf-1.36.2/build/telegraf config
```

Unfortunately I'm running out of time to look into this further, but
as this is easily reproducible, someone else with better knowledge of
go and/or systemd-nspawn peculiarities may pick up here. My gut says
this may be a systemd-nspawn configuration issue, affecting go threading
or signal handling (there's a SIGURG passed between processes) hindering
go's process cleanup on termination.

Using `taskset -a -c 1 ./build/telegraf config` does NOT work around the
issue, the binary still hangs in the same way.

My strace -xyffv excerpt of periodically repeated hang loop, one period:

 <unfinished ...>
[pid     5] <... epoll_pwait resumed>, [], 128, 999, NULL, 0) = 0
[pid     4] <... futex resumed>)        = -1 ETIMEDOUT (Connection timed out)
[pid     5] futex(0x5f2c52e08130, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid     4] sched_yield( <unfinished ...>
[pid     5] <... futex resumed>)        = 0
[pid     4] <... sched_yield resumed>)  = 0
[pid     5] openat(AT_FDCWD</>, "/proc/3/stat", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid     4] sched_getaffinity(0, 8192 <unfinished ...>
[pid     5] <... openat resumed>)       = 7</proc/3/stat>
[pid     4] <... sched_getaffinity resumed>, [1]) = 8
[pid     5] fcntl(7</proc/3/stat>, F_GETFL <unfinished ...>
[pid     4] pread64(3</sys/fs/cgroup/cpu.max> <unfinished ...>
[pid     5] <... fcntl resumed>)        = 0x8000 (flags O_RDONLY|O_LARGEFILE)
[pid     4] <... pread64 resumed>, "max 100000\n", 64, 0) = 11
[pid     5] fcntl(7</proc/3/stat>, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE <unfinished ...>
[pid     4] futex(0x5f2c52dfdc50, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid     5] <... fcntl resumed>)        = 0
[pid     3] <... futex resumed>)        = 0
[pid     4] <... futex resumed>)        = 1
[pid     3] epoll_ctl(5<anon_inode:[eventpoll]>, EPOLL_CTL_ADD, 7</proc/3/stat>, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data=0x7ddb66bb6c000015} <unfinished ...>
[pid     5] futex(0xc0000e3158, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid     3] <... epoll_ctl resumed>)    = -1 EPERM (Operation not permitted)
[pid     4] nanosleep({tv_sec=0, tv_nsec=20000} <unfinished ...>
[pid     3] fcntl(7</proc/3/stat>, F_GETFL) = 0x8800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE)
[pid     4] <... nanosleep resumed>, NULL) = 0
[pid     3] fcntl(7</proc/3/stat>, F_SETFL, O_RDONLY|O_LARGEFILE <unfinished ...>
[pid     4] nanosleep({tv_sec=0, tv_nsec=20000} <unfinished ...>
[pid     3] <... fcntl resumed>)        = 0
[pid     3] fstat(7</proc/3/stat> <unfinished ...>
[pid     4] <... nanosleep resumed>, NULL) = 0
[pid     3] <... fstat resumed>, {st_dev=makedev(0, 0x6c), st_ino=267663, st_mode=S_IFREG|0444, st_nlink=1, st_uid=0, st_gid=0, st_blksize=1024, st_blocks=0, st_size=0, st_atime=1759226607 /* 2025-09-30T12:03:27.583355914+0200 */, st_atime_nsec=583355914, st_mtime=1759226607 /* 2025-09-30T12:03:27.583355914+0200 */, st_mtime_nsec=583355914, st_ctime=1759226607 /* 2025-09-30T12:03:27.583355914+0200 */, st_ctime_nsec=583355914}) = 0
[pid     4] nanosleep({tv_sec=0, tv_nsec=20000} <unfinished ...>
[pid     3] read(7</proc/3/stat>, "3 (telegraf) R 1 1 1 34816 1 419"..., 512) = 317
[pid     4] <... nanosleep resumed>, NULL) = 0
[pid     3] read(7</proc/3/stat> <unfinished ...>
[pid     4] nanosleep({tv_sec=0, tv_nsec=20000} <unfinished ...>
[pid     3] <... read resumed>, "", 579) = 0
[pid     3] close(7</proc/3/stat> <unfinished ...>
[pid     4] <... nanosleep resumed>, NULL) = 0
[pid     3] <... close resumed>)        = 0
[pid     4] nanosleep({tv_sec=0, tv_nsec=20000} <unfinished ...>
[pid     3] sysinfo({uptime=8492, loads=[26912, 310112, 222848], totalram=33252589568, freeram=13363355648, sharedram=1062043648, bufferram=482197504, totalswap=17178816512, freeswap=17178816512, procs=1240, totalhigh=0, freehigh=0, mem_unit=1}) = 0
[pid     4] <... nanosleep resumed>, NULL) = 0
[pid     3] epoll_pwait(5<anon_inode:[eventpoll]> <unfinished ...>
[pid     4] nanosleep({tv_sec=0, tv_nsec=20000} <unfinished ...>
[pid     3] <... epoll_pwait resumed>, [], 128, 0, NULL, 0) = 0
[pid     3] epoll_pwait(5<anon_inode:[eventpoll]> <unfinished ...>
[pid     4] <... nanosleep resumed>, NULL) = 0
[pid     4] futex(0x5f2c52e08130, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=999750431}

Good luck, anyone else!

    — Dennis

Reply via email to