Re: false-positive failure of the root-removal test

Pádraig Brady Thu, 15 Oct 2015 02:34:38 -0700

On 15/10/15 09:15, Bernhard Voelker wrote:
> On 10/15/2015 03:44 AM, Pádraig Brady wrote:
>> From 1686e5cb23a8b2494bc7c3096fc8447d79421f21 Mon Sep 17 00:00:00 2001
>> From: =?UTF-8?q?P=C3=A1draig=20Brady?= <[email protected]>
>> Date: Wed, 14 Oct 2015 23:26:22 +0100
>> Subject: [PATCH] tests: avoid race under load for tests/rm/r-root.sh
> 
> nice work!
> 
>> * tests/rm/r-root.sh: Use gdb rather than timeout(1) as the
>> last resort protection against unlinkat() calls.  The timeout
>> of 2s was susceptible to false positives under load, and
>> gdb is stronger protection in any case.  We remove the
>> "expensive" tag on this test also since it should be robust.
>> Reported by Jim Meyering.
>> ---
>>  tests/rm/r-root.sh | 94 
>> +++++++++++++++++++++++++++++++++++++++---------------
>>  1 file changed, 68 insertions(+), 26 deletions(-)
>>
>> diff --git a/tests/rm/r-root.sh b/tests/rm/r-root.sh
>> index c06332a..e0564b8 100755
>> --- a/tests/rm/r-root.sh
>> +++ b/tests/rm/r-root.sh
>> @@ -32,13 +32,19 @@ skip_if_root_
>>  # LD_PRELOAD environment variable.  This requires shared libraries to work.
>>  require_gcc_shared_
>>  
>> -# This isn't terribly expensive, but it must not be run under heavy load.
>> -# The reason is the conservative 'timeout' setting below to limit possible
>> -# damage in the worst case which yields a race under heavy load.
>> -# Marking this test as "expensive" therefore is a compromise, i.e., adding
>> -# this test to the list ensures it still gets _some_ (albeit minimal)
>> -# coverage while not causing false-positive failures in day to day runs.
>> -expensive_
> 
> FWIW: the test now is really now taking longer (~3.3s instead of ~1.4s 
> before).
> Do we have a threshold/guideline as when to mark tests 'expensive_'?


Anything over around 5s on a modern machine I would think.

>> +# Use gdb to provide further protection by limiting calls to unlinkat().
>> +( timeout 10s gdb --version ) > gdb.out 2>&1
>> +case $(cat gdb.out) in
>> +    *'GNU gdb'*) ;;
>> +    *) skip_ "can't run gdb";;
>> +esac
>> +
>> +# Break on a line rather than a symbol, to cater for inline functions
>> +break_src="$abs_top_srcdir/src/remove.c"
>> +break_line=$(grep -n ^excise "$break_src") || framework_failure_
>> +break_line=$(echo "$break_line" | cut -d: -f1) || framework_failure_
>> +break_line="$break_src:$break_line"
>> +
> 
> this doesn't double-check that grep+cut above really find the source
> line (imagine someone would rename that source/function in future), and
> if $break_line is something usable.  Well, GDB may complain with
>     "malformed linespec error: unexpected end of input"
> when e.g. the line number is missing, but I wouldn't rely on that;
> maybe add another tiny pattern check here.

I was relying on the witness file "excise.break" to check
that the line, breakpoint, python support, etc. was working.

> 
>>  cat > k.c <<'EOF' || framework_failure_
>>  #include <stdio.h>
>> @@ -63,6 +69,26 @@ EOF
>>  gcc_shared_ k.c k.so \
>>    || framework_failure_ 'failed to build shared library'
>>  
>> +# Note breakpoint commands don't work in batch mode
>> +# https://sourceware.org/bugzilla/show_bug.cgi?id=10079
>> +# So we use python to script behavior upon hitting the breakpoint
>> +cat > bp.py <<'EOF.py' || framework_failure_
>> +def breakpoint_handler (event):
>> +  if not isinstance(event, gdb.BreakpointEvent):
>> +    return
>> +  hit_count = event.breakpoints[0].hit_count
>> +  if hit_count == 1:
>> +    gdb.execute('shell touch excise.break')
>> +    gdb.execute('continue')
>> +  elif hit_count > 2:
>> +    gdb.write('breakpoint hit twice already')
> 
> do we see this output somewhere?

Without --batch-silent it is output.

>> +  gdb -nx --batch-silent -return-child-result                               
>> \
>> +    --eval-command="set exec-wrapper                                        
>> \
>> +                     env 'LD_PRELOAD=$LD_PRELOAD:./k.so' $skip_exit"        
>> \
> 
> Would you mind folding the env LD_PRELOAD + $skip_exit stuff into one line?
> It a) looks a bit odd in the test source and the log, and b) maybe some
> strange GDB version may not like newlines in there (still paranoia from
> my side).

The newline is not passed to gdb.
It's more awkward to adjust while staying within the mandated line length.

>> +#   rm: it is dangerous to operate recursively on 'FILE' (same as '/')
>> +# Strip that part off for the following comparison.
>> +clean_rm_err()
>> +{
>> +  sed "s/.*rm: /rm: /; \
>> +       s/\(rm: it is dangerous to operate recursively on\).*$/\1 '\/'/"
>> +}
>> +
> 
> It seems that this is not enough.  At least on openSUSE:13.2,
> I'm getting this:
> 
> + compare_ exp err2
> + diff -u exp err2
> --- exp 2015-10-15 09:42:47.956590048 +0200
> +++ err2        2015-10-15 09:42:48.051591245 +0200
> @@ -1,2 +1,3 @@
> +Got object file from memory but can't read symbols: File truncated.
>  rm: it is dangerous to operate recursively on '/'
>  rm: use --no-preserve-root to override this failsafe
> + fail=1
> 
> This GDB bug seems to be fixed meanwhile, but I can't check on a newer
> system right now.

Ugh. I'll add an initial check to skip in this case.

> BTW: I suggest moving the sed call into exercise_rm_r_root: nw, there
> are many pieces playing together (returns_, gdb, python, LD_PRELOAD,
> CU_TEST_SKIP_EXIT, the witness files, ect.), so the move would make
> reading easier.

Done

>> -  grep "^rm: refusing to remove '\.' or '\.\.' directory: skipping" err \
>> +  grep "rm: refusing to remove '\.' or '\.\.' directory: skipping" err \

> There's another place where to remove the '^' from the pattern (line 286):
> 
> grep "^rm: it is dangerous to operate recursively on '/'" err && fail=1

Good catch.

I already pushed unfortunately.
How about the attached on top?

thanks for the review!
Pádraig

From c02dde77680c8e5ea758dc11b0d164af76d996ca Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <[email protected]>
Date: Thu, 15 Oct 2015 10:22:20 +0100
Subject: [PATCH] tests: avoid false failure in rm/r-root.sh with gdb warnings

* tests/rm/r-root.sh: SKip the test if there are gdb warnings
that will impact further stderr checks.  For example some
buggy gdb versions may report "Got object file from memory
but can't read symbols: File truncated".  Also fix an incorrect
stderr check from the previous change.
Reported by Bernhard Voelker.
---
 tests/rm/r-root.sh | 55 +++++++++++++++++++++++++++---------------------------
 1 file changed, 28 insertions(+), 27 deletions(-)

diff --git a/tests/rm/r-root.sh b/tests/rm/r-root.sh
index e0564b8..6754c10 100755
--- a/tests/rm/r-root.sh
+++ b/tests/rm/r-root.sh
@@ -89,6 +89,19 @@ def breakpoint_handler (event):
 gdb.events.stop.connect(breakpoint_handler)
 EOF.py
 
+# In order of the sed expressions below, this cleans:
+#
+# 1. gdb uses the full path when running rm, so remove the leading dirs.
+# 2. For some of the "/" synonyms, the error diagnostic slightly differs from
+# that of the basic "/" case (see gnulib's fts_open' and ROOT_DEV_INO_WARN):
+#   rm: it is dangerous to operate recursively on 'FILE' (same as '/')
+# Strip that part off for the following comparison.
+clean_rm_err_()
+{
+  sed "s/.*rm: /rm: /; \
+       s/\(rm: it is dangerous to operate recursively on\).*$/\1 '\/'/"
+}
+
 #-------------------------------------------------------------------------------
 # exercise_rm_r_root: shell function to test "rm -r '/'"
 # The caller must provide the FILE to remove as well as any options
@@ -121,9 +134,13 @@ exercise_rm_r_root ()
     --eval-command='source bp.py'					\
     --eval-command="run -rv --one-file-system $*"			\
     --eval-command='quit'						\
-    rm < /dev/null > out 2> err
+    rm < /dev/null > out 2> err.t
 
-  return $?
+  ret=$?
+
+  clean_rm_err_ < err.t > err || framework_failure_
+
+  return $ret
 }
 
 # Verify that "rm -r dir" basically works.
@@ -143,6 +160,7 @@ for file in dir file ; do
   test -e "$file"            || skip=1
   test -f x                  || skip=1
   test -f excise.break       || skip=1  # gdb works and breakpoint hit
+  compare /dev/null err      || skip=1
 
   test "$skip" = 1 \
     && { cat out; cat err; \
@@ -156,19 +174,6 @@ rm: it is dangerous to operate recursively on '/'
 rm: use --no-preserve-root to override this failsafe
 EOD
 
-# In order of the sed expressions below, this cleans:
-#
-# 1. gdb uses the full path when running rm, so remove the leading dirs.
-# 2. For some of the "/" synonyms, the error diagnostic slightly differs from
-# that of the basic "/" case (see gnulib's fts_open' and ROOT_DEV_INO_WARN):
-#   rm: it is dangerous to operate recursively on 'FILE' (same as '/')
-# Strip that part off for the following comparison.
-clean_rm_err()
-{
-  sed "s/.*rm: /rm: /; \
-       s/\(rm: it is dangerous to operate recursively on\).*$/\1 '\/'/"
-}
-
 #-------------------------------------------------------------------------------
 # Exercise "rm -r /" without and with the --preserve-root option.
 # Exercise various synonyms of "/" including symlinks to it.
@@ -190,14 +195,12 @@ for opts in           \
 
   returns_ 1 exercise_rm_r_root $opts || fail=1
 
-  clean_rm_err < err > err2 || framework_failure_
-
-  # Expect nothing in 'out' and the above error diagnostic in 'err2'.
+  # Expect nothing in 'out' and the above error diagnostic in 'err'.
   # As rm(1) should have skipped the "/" argument, it does not call unlinkat().
   # Therefore, the evidence file "x" should not exist.
-  compare /dev/null out  || fail=1
-  compare exp       err2 || fail=1
-  test -f x              && fail=1
+  compare /dev/null out || fail=1
+  compare exp       err || fail=1
+  test -f x             && fail=1
 
   # Do nothing more if this test failed.
   test $fail = 1 && { cat out; cat err; Exit $fail; }
@@ -219,8 +222,6 @@ CU_TEST_SKIP_EXIT=1
 
 returns_ 1 exercise_rm_r_root --preserve-root file1 '/' file2 || fail=1
 
-clean_rm_err < err > err2 || framework_failure_
-
 unset CU_TEST_SKIP_EXIT
 
 cat <<EOD > out_removed
@@ -232,9 +233,9 @@ EOD
 # Both 'file1' and 'file2' should be removed.  Simply verify that in the
 # "out" file, as the replacement unlinkat() dummy did not remove them.
 # Expect the evidence file "x" to exist.
-compare out_removed out  || fail=1
-compare exp         err2 || fail=1
-test -f x                || fail=1
+compare out_removed out || fail=1
+compare exp         err || fail=1
+test -f x               || fail=1
 
 # Do nothing more if this test failed.
 test $fail = 1 && { cat out; cat err; Exit $fail; }
@@ -283,7 +284,7 @@ exercise_rm_r_root  --interactive=never --no-preserve-root '/' \
   || fail=1
 
 # The 'err' file should not contain the above error diagnostic.
-grep "^rm: it is dangerous to operate recursively on '/'" err && fail=1
+grep "rm: it is dangerous to operate recursively on '/'" err && fail=1
 
 # Instead, rm(1) should have called the intercepted unlinkat() function,
 # i.e., the evidence file "x" should exist.
-- 
2.5.0

Re: false-positive failure of the root-removal test

Reply via email to