On 02/08/2013 05:09 PM, Bernhard Voelker wrote:
On February 8, 2013 at 5:30 PM "Pádraig Brady" <[email protected]> wrote:
On 02/08/2013 02:53 PM, Bernhard Voelker wrote:
On February 7, 2013 at 8:57 PM "Pádraig Brady" <[email protected]> wrote:
* SLES-10.3 (i586):
    gcc (GCC) 4.1.2 20070115 (SUSE Linux)

    FAIL: tests/tail-2/inotify-rotate.sh
          NFS issue during cleanup_
          reproduced: 2x out of 2 tries.

If you put a "wait" before the "Exit" at the end of that test, does it help.
As a less desirable solution we could put require_local_dir_ at the top
of this test.

Unfortunately not:

I think I see what's happening.  With /bin/sh -> bash,
bash will create a redundant subshell for the : && timeout ... construct.
I.E. this prints bash rather than timeout:

  : && timeout 5 sleep 1 & readlink /proc/$!/exe

So if you had dash installed, this would probably work:

make check SHELL=dash TESTS="tests/tail-2/inotify-rotate.sh" \
  SUBDIRS=. VERBOSE=yes RUN_EXPENSIVE_TESTS=yes

The reason the subshell causes an issue is that it won't
auto send the SIGHUP to timeout when it gets a SIGTERM
as it's a non interactive script, and so the timeout
processes and thus the tail processes accessing the files
hang around until the 40s timeout.

In any case the attached should avoid the subshell
and hopefully fix the issue.

BTW I found a related signal race in bash when looking at this,
where it will ignore the SIGTERM before it execs the background
command, but I'm fairly sure we're not hitting this here
and so will only send details of that to the bash list.

Why do we need this in remove_tmp_ ()?

   # If removal fails and exit status was to be 0, then change it to 1.
   rm -rf "$test_dir_" || { test $__st = 0 && __st=1; }

I think that's fine. It's indicating the error which
we would have missed otherwise.
Maybe it should call framework_failure_ or something,
rather than set __st=1, but that's a minor detail.

cheers,
Pádraig.
>From 1420fb36651d105c96eee2ace3563a32c1c0a1af Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <[email protected]>
Date: Sat, 9 Feb 2013 04:39:40 +0000
Subject: [PATCH] tests: tail-2/inotify-rotate: fix a false failure on NFS

* tests/tail-2/inotify-rotate.sh: Avoid a subshell with bash,
which in turn causes the `kill` to be ineffective to the tail
processes (as the SIGTERM is sent to the subshell which doesn't
propagate the signal on to its children).  On NFS the test
cleanup will then fail as there will be .nfs files maintained
in the directory for the files still opened by the tail processes.
Reported by Bernhard Voelker.
---
 tests/tail-2/inotify-rotate.sh |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/tests/tail-2/inotify-rotate.sh b/tests/tail-2/inotify-rotate.sh
index 94c2b7d..4a16202 100755
--- a/tests/tail-2/inotify-rotate.sh
+++ b/tests/tail-2/inotify-rotate.sh
@@ -47,7 +47,8 @@ for i in $(seq 50); do
     # Normally less than a second is required here, but with heavy load
     # and a lot of disk activity, even 20 seconds is insufficient, which
     # leads to this timeout killing tail before the "ok" is written below.
-    :>k && :>x && timeout 40 tail -F k > out 2>&1 &
+    :>k && :>x || framework_failure_ failed to initialize files
+    timeout 40 tail -F k > out 2>&1 &
     pid=$!
     sleep .1
     echo b > k;
@@ -65,4 +66,5 @@ for i in $(seq 50); do
     test $found = 0 && { cat out; fail_ failed to detect echoed '"ok"'; }
 done
 
+wait
 Exit $fail
-- 
1.7.7.6

Reply via email to