On Wed, Dec 05, 2018 at 04:56:21PM -0500, Jeff King wrote:
> Could we just kill them all?
>
> I guess it's a little tricky, because $! is going to give us the pid of
> each subshell. We actually want to kill the test sub-process. That takes
> a few contortions, but the output is nice (every sub-job immediately
> says "ABORTED ...", and the final process does not exit until the whole
> tree is done):
It's trickier than that:
- Nowadays our test lib translates SIGINT to exit, but not TERM (or
HUP, in case a user decides to close the terminal window), which
means that if the test runs any daemons in the background, then
those won't be killed when the stress test is aborted.
This is easy enough to address, and I've been meaning to do this
in an other patch series anyway.
- With the (implied) '--verbose-log' the process killed in the
background subshell's SIGTERM handler is not the actual test
process, because '--verbose-log' runs the test again in a piped
subshell to save its output:
(GIT_TEST_TEE_STARTED=done ${TEST_SHELL_PATH} "$0" "$@" 2>&1;
echo $? >"$TEST_RESULTS_BASE.exit") | tee -a "$GIT_TEST_TEE_OUTPUT_FILE"
That 'kill' kills the "outer" shell process.
And though I get "ABORTED..." immediately and I get back my
prompt right away, the commands involved in the above construct
are still running in the background:
$ ./t3903-stash.sh --stress=1
^CABORTED 0.0
$ ps a |grep t3903
1280 pts/17 S 0:00 /bin/sh ./t3903-stash.sh --stress=1
1281 pts/17 S 0:00 tee -a
<...>/test-results/t3903-stash.stress-0.out
1282 pts/17 S 0:00 /bin/sh ./t3903-stash.sh --stress=1
4173 pts/17 S+ 0:00 grep t3903
I see this behavior with all shells I tried.
I haven't yet started to think it through why this happens :)
Not implying '--verbose-log' but redirecting the test script's
output, like you did in your 'stress' script, seems to work in
dash, ksh, and ks93, but not in Bash v4.3 or later, where, for
whatever reason, the subshell get SIGINT before the SIGTERM would
arrive.
While we could easily handle SIGINT in the subshell with the same
signal handler as SIGTERM, it bothers me that something
fundamental is wrong here.
Furthermore, then we should perhaps forbid '--stress' in
combination with '--verbose-log' or '--tee'.
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 9b7f687396..357dead3ff 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -98,8 +98,9 @@ done,*)
> mkdir -p "$(dirname "$TEST_RESULTS_BASE")"
> stressfail="$TEST_RESULTS_BASE.stress-failed"
> rm -f "$stressfail"
> - trap 'echo aborted >"$stressfail"' TERM INT HUP
> + trap 'echo aborted >"$stressfail"; kill $job_pids 2>/dev/null; wait'
> TERM INT HUP
>
> + job_pids=
> job_nr=0
> while test $job_nr -lt "$job_count"
> do
> @@ -108,10 +109,13 @@ done,*)
> GIT_TEST_STRESS_JOB_NR=$job_nr
> export GIT_TEST_STRESS_STARTED GIT_TEST_STRESS_JOB_NR
>
> + trap 'kill $test_pid 2>/dev/null' TERM
> +
> cnt=0
> while ! test -e "$stressfail"
> do
> - if $TEST_SHELL_PATH "$0" "$@" >/dev/null 2>&1
> + $TEST_SHELL_PATH "$0" "$@" >/dev/null 2>&1 &
> test_pid=$!
> + if wait
> then
> printf >&2 "OK %2d.%d\n"
> $GIT_TEST_STRESS_JOB_NR $cnt
> elif test -f "$stressfail" &&
> @@ -124,16 +128,11 @@ done,*)
> fi
> cnt=$(($cnt + 1))
> done
> - ) &
> + ) & job_pids="$job_pids $!"
> job_nr=$(($job_nr + 1))
> done
>
> - job_nr=0
> - while test $job_nr -lt "$job_count"
> - do
> - wait
> - job_nr=$(($job_nr + 1))
> - done
> + wait
>
> if test -f "$stressfail" && test "$(cat "$stressfail")" != "aborted"
> then
>