On 23/02/2026 11:45, jvogel--- via GNU coreutils Bug Reports wrote:
Running the coreutils 9.10 test suite on the alpine linux builder results in 
hangup:

make[1]: *** [Makefile:24419: check-recursive] Hangup
make[3]: *** [Makefile:24668: check-TESTS] Hangup
make: *** [Makefile:24922: check] Hangup
make[2]: *** [Makefile:24920: check-am] Hangup
make[4]: *** [Makefile:24685: tests/misc/usage_vs_refs.log] Error 129

The two tests which trigger the failure are tests/tail/overlay-headers.sh
and tests/timeout/timeout.sh. There's an interaction between these two tests.
It can be reproduced with only those 2 tests enabled. There is an strace log 
[1].
The builders are run as a service, so no tty.

To reproduce, I ran `setsid make -j2 check` with only the two mentioned tests
enabled:

Since the build is being run without a controlling terminal attached,
it looks like a signal is getting propagated up to a parent process.
After reversing the e1cbe82cc6131b2cb8441b948f75a0eb28bdcc40 patch [2],
our build and test run of coreutils 9.10 completes successfully.

[1] https://dev.alpinelinux.org/~kevin/coreutils-tests.strace
[2] 
https://gitlab.alpinelinux.org/alpine/aports/-/commit/5c4bc917b78b59204aacd620103916f1a0c1542a
Thank you for the excellent bug report!

At first I was not able to repro on my system,
but was after increasing the race window
by putting a `sleep 10` after `retry_delay_ wait4stopped_`
in tests/tail/overlay-headers.sh and running:

setsid make TESTS="tests/timeout/timeout.sh \
 tests/tail/overlay-headers.sh" SUBDIRS=. -j2 check

I think the issue was latent (with small window) since v8.12-138-g7e576fc40,
inadvertently avoided with v9.7-325-g4ca51b101,
but then reintroduced (with wider window) with v9.9-163-ge1cbe82cc.

These tests in combination ca trigger the kernel to induce
its job control mechanism to prevent stuck processes.
I.e. where it sends SIGHUP + SIGCONT to a process group
when it determines that group may become orphaned,
and there are stopped processes in that group.

Specifically in our case this happens when in timeout.sh
we (sleep .1 & exec timeout 0.5 ...), which when timeout exits
reparents sleep (as timeout doesn't reap all children),
and this reparenting triggers the kernel check of
the main test process group, which depending on timing
may have tail in STOP state in another test,
thus triggering the kernel to send the SIGHUP + SIGCONT.

Note this doesn't happen at an interactive shell
as that is a different group in the same session
and thus the kernel determines the main test process group
can not become orphaned, and thus not trigger the SIGHUP mechanism

The attached addresses this by creating separate
process groups in both these tests.  Either would suffice
to avoid the issue, but we adjust both to be defensive.
There are no other tests that stop processes like this.

Marking this as done.

thanks,
Padraig
From 21d287324aa43aa3a31f39619ade0deac7fd6013 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <[email protected]>
Date: Tue, 24 Feb 2026 15:44:41 +0000
Subject: [PATCH] tests: fix job control triggering test termination

This avoids the test harness being terminated like:
  make[1]: *** [Makefile:24419: check-recursive] Hangup
  make[3]: *** [Makefile:24668: check-TESTS] Hangup
  make: *** [Makefile:24922: check] Hangup
  make[2]: *** [Makefile:24920: check-am] Hangup
  make[4]: *** [Makefile:24685: tests/misc/usage_vs_refs.log] Error 129
  ...

This happened sometimes when the tests were being run non interactively.
For example when run like:

  setsid make TESTS="tests/timeout/timeout.sh \
   tests/tail/overlay-headers.sh" SUBDIRS=. -j2 check

Note the race window can be made bigger by adding a sleep
after tail is stopped in overlay-headers.sh

The race can trigger the kernel to induce its job control
mechanism to prevent stuck processes.
I.e. where it sends SIGHUP + SIGCONT to a process group
when it determines that group may become orphaned,
and there are stopped processes in that group.

* tests/tail/overlay-headers.sh: Use setsid(1) to keep the stopped
tail process in a separate process group, thus avoiding any kernel
job control protection mechanism.
* tests/timeout/timeout.sh: Use setsid(1) to avoid the kernel
checking the main process group when sleep(1) is reparented.
Fixes https://bugs.gnu.org/80477
---
 tests/tail/overlay-headers.sh |  8 +++++++-
 tests/timeout/timeout.sh      | 11 ++++++++---
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/tests/tail/overlay-headers.sh b/tests/tail/overlay-headers.sh
index be9b6a7df..1e6da0a3f 100755
--- a/tests/tail/overlay-headers.sh
+++ b/tests/tail/overlay-headers.sh
@@ -20,6 +20,8 @@
 . "${srcdir=.}/tests/init.sh"; path_prepend_ ./src
 print_ver_ tail sleep
 
+setsid true || skip_ 'setsid required to control groups'
+
 # Function to count number of lines from tail
 # while ignoring transient errors due to resource limits
 countlines_ ()
@@ -54,7 +56,11 @@ echo start > file2 || framework_failure_
 env sleep 60 & sleep=$!
 
 # Note don't use timeout(1) here as it currently
-# does not propagate SIGCONT
+# does not propagate SIGCONT.
+# Note use setsid here to ensure we're in a separate process group
+# as we're going to STOP this tail process, and this can trigger
+# the kernel to send SIGHUP to a group if other tests have
+# processes that are reparented. (See tests/timeout/timeout.sh).
 tail $fastpoll --pid=$sleep -f file1 file2 > out & pid=$!
 
 # Ensure tail is running
diff --git a/tests/timeout/timeout.sh b/tests/timeout/timeout.sh
index 9a395416b..fbb043312 100755
--- a/tests/timeout/timeout.sh
+++ b/tests/timeout/timeout.sh
@@ -56,9 +56,14 @@ returns_ 124 timeout --foreground -s0 -k1 .1 sleep 10 && fail=1
 ) || fail=1
 
 # Don't be confused when starting off with a child (Bug#9098).
-out=$(sleep .1 & exec timeout .5 sh -c 'sleep 2; echo foo')
-status=$?
-test "$out" = "" && test $status = 124 || fail=1
+# Use setsid to avoid sleep being in the test's process group, as
+# upon reparenting it can trigger an orphaned process group SIGHUP
+# (if there were stopped processes in other tests).
+if setsid true; then
+  out=$(setsid sleep .1 & exec timeout .5 sh -c 'sleep 2; echo foo')
+  status=$?
+  test "$out" = "" && test $status = 124 || fail=1
+fi
 
 # Verify --verbose output
 cat > exp <<\EOF
-- 
2.53.0

Reply via email to