On Wed, Oct 14, 2015 at 10:43 AM, Jim Meyering <[email protected]> wrote: > Running a massively parallel "make very-expensive-check" > (-j73 on a 48-core system), the rm/r-root.sh test would fail > about 1-in-2 or 1-in-3 trials due to expiration of the 2-second > timeout here: > > diff --git a/tests/rm/r-root.sh b/tests/rm/r-root.sh > index c06332a..4e645e6 100755 > --- a/tests/rm/r-root.sh > +++ b/tests/rm/r-root.sh > @@ -88,7 +88,7 @@ exercise_rm_r_root () > skip_exit='CU_TEST_SKIP_EXIT=1' > fi > > - timeout --signal=KILL 2 \ > + timeout --signal=KILL 5 \ > env LD_PRELOAD=$LD_PRELOAD:./k.so $skip_exit \ > rm -rv --one-file-system "$@" > out 2> err > > I made the above change and observed that the whole test then > succeeded 6 times in a row. Then I read the comment above that change: > > # exercise_rm_r_root: shell function to test "rm -r '/'" > # The caller must provide the FILE to remove as well as any options > # which should be passed to 'rm'. > # Paranoia mode on: > # For the worst case where both rm(1) would fail to refuse to process the "/" > # argument (in the cases without the --no-preserve-root option), and > # intercepting the unlinkat(1) system call would fail (which actually already > # has been proven to work above), and the current non root user has > # write access to "/", limit the damage to the current file system via > # the --one-file-system option. > # Furthermore, run rm(1) via timeout(1) that kills that process after > # a maximum of 2 seconds. > > So maybe compromise at 3 seconds (with that, it's passed 4 times so far)? > Probably better still: I'll remember this and decrease -j's argument from > 1+3N/2 to something slightly less abusive.
FYI, while trying to confirm that "3" is sufficient, I hit another failure, but now in another race-susceptible test: + diff -u exp out --- exp 2015-10-14 11:26:05.424685178 -0700 +++ out 2015-10-14 11:26:05.424685178 -0700 @@ -1 +0,0 @@ -line + fail=1 + Exit 1 + set +e + exit 1 + exit 1 + remove_tmp_ + __st=1 + cleanup_ + : + cd /data/users/meyering/w/co/cu + chmod -R u+rwx /data/users/meyering/w/co/cu/gt-follow-stdin.sh.y0sA + rm -rf /data/users/meyering/w/co/cu/gt-follow-stdin.sh.y0sA + exit 1 FAIL tests/tail-2/follow-stdin.sh (exit status: 1) So I'll just remember to use reduced parallelism for this task.
