Running a massively parallel "make very-expensive-check"
(-j73 on a 48-core system), the rm/r-root.sh test would fail
about 1-in-2 or 1-in-3 trials due to expiration of the 2-second
timeout here:
diff --git a/tests/rm/r-root.sh b/tests/rm/r-root.sh
index c06332a..4e645e6 100755
--- a/tests/rm/r-root.sh
+++ b/tests/rm/r-root.sh
@@ -88,7 +88,7 @@ exercise_rm_r_root ()
skip_exit='CU_TEST_SKIP_EXIT=1'
fi
- timeout --signal=KILL 2 \
+ timeout --signal=KILL 5 \
env LD_PRELOAD=$LD_PRELOAD:./k.so $skip_exit \
rm -rv --one-file-system "$@" > out 2> err
I made the above change and observed that the whole test then
succeeded 6 times in a row. Then I read the comment above that change:
# exercise_rm_r_root: shell function to test "rm -r '/'"
# The caller must provide the FILE to remove as well as any options
# which should be passed to 'rm'.
# Paranoia mode on:
# For the worst case where both rm(1) would fail to refuse to process the "/"
# argument (in the cases without the --no-preserve-root option), and
# intercepting the unlinkat(1) system call would fail (which actually already
# has been proven to work above), and the current non root user has
# write access to "/", limit the damage to the current file system via
# the --one-file-system option.
# Furthermore, run rm(1) via timeout(1) that kills that process after
# a maximum of 2 seconds.
So maybe compromise at 3 seconds (with that, it's passed 4 times so far)?
Probably better still: I'll remember this and decrease -j's argument from
1+3N/2 to something slightly less abusive.