autotest vs. in-use directories

Eric Blake Thu, 09 Apr 2009 10:27:43 -0700

I discovered this on cygwin 1.7, which is attempting to add the ability to 
emulate deleting a directory even when a process owns a handle to that 
directory.  Cygwin 1.5 flat out fails, as permitted by POSIX, because Windows 
does not permit deleting an open handle, and autotest already has code that 
gracefully deals with this failure.  But the cygwin 1.7 emulation works by 
detecting failure to delete because of an open handle, and uses a fallback of 
renaming the affected handle to move it into the recycle bin; in most cases, 
this makes it appear that an in-use handle has been successfully unlinked - no 
new process can access the deleted contents, but existing processes can take as 
long as they need to hold on to the file, and the file finally disappears when 
the last handle closes.  Unfortunately, with the current state of cygwin 1.7, 
the rename does not affect subsequent readdir, so a listing of the parent 
directory still shows the supposedly deleted subdir; also, it does not permit 
creating a new file or directory with the name of the old deleted file.  As a 
result, several tests fail on cygwin 1.7 that used to pass on cygwin 1.5; all 
of them have the semantics similar to:


(cd micro-test.dir/1 && ./run)

The problem is that the shell invoking ./run must stick around to wait for its 
return status, thus keeping a handle open on micro-test.dir.  However, the 
autotest-generated run scripts proceed to cd back to the top testsuite 
directory, nuke the per-test directory, and repopulate it from scratch.  Under 
cygwin 1.5, nuking the per-test directory fails (it is still in use by the 
shell waiting for ./run to complete), but we ignore that failure and are able 
to reuse the (now-empty) directory, effectively repopulating it from scratch 
anyways.  But under cygwin 1.7, the nuke succeeds, but we are then unable to 
recreate the directory because of the Windows limitation mentioned above.  
Changing the testsuite to use 'exec ./run' instead of './run' solves the 
problem, because there is no longer a parent shell waiting for status from 
within that directory, and thus no longer any process keeping the directory 
handle in use across run's attempt to nuke and rebuild the per-test directory.

But in thinking about the issue, it affects more than just cygwin.  Consider a 
Unix system with an NFS mount (I tested on Solaris 8).  There is no restriction 
against deleting a directory that is in use by another process, nor against 
recreating a new directory by the same name.  But the process that had the 
directory ripped out from under it does NOT see the recreated directory of the 
same name:

1$ cd /tmp
1$ mkdir foo
1$ cd foo
1$ touch bar
1$ ls
bar

2$ cd /tmp
2$ rm -Rf foo
2$ mkdir foo
2$ cd foo
2$ touch blah
2$ ls
blah

1$ ls
ls: reading directory .: Stale NFS file handle

In other words, the fact that testsuite is trying to nuke the _entire_ per-test 
directory, rather than just its contents, means that any user who does 'cd 
testsuite.dir/nnn; ./run' has given their current shell a stale directory 
handle for $PWD.

So I'm thinking about applying this patch.  Rather than changing autoconf's 
testsuite to use 'exec ./run' (which does indeed make the failing tests once 
again pass for cygwin 1.7, but doesn't help the interactive user who won't want 
to end their session by using exec), I decided to fix autotest to quit trying 
to remove the entire directory, but instead only remove its contents.  I 
believe I got the glob correct for deleting all hidden files but not '.' 
nor '..'.


From: Eric Blake <[email protected]>
Date: Thu, 9 Apr 2009 11:13:51 -0600
Subject: [PATCH] Avoid problems caused by deleting in-use directory.

* lib/autotest/general.m4 (AT_INIT) <at_fn_group_prepare>: Only
remove the contents of $at_group_dir, not the directory itself.

Signed-off-by: Eric Blake <[email protected]>
---
 ChangeLog               |    4 ++++
 lib/autotest/general.m4 |    9 ++++++---
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 65c0250..fcbc835 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,9 @@
 2009-04-09  Eric Blake  <[email protected]>

+       Avoid problems caused by deleting in-use directory.
+       * lib/autotest/general.m4 (AT_INIT) <at_fn_group_prepare>: Only
+       remove the contents of $at_group_dir, not the directory itself.
+
        Fix regression in empty test.
        * lib/autotest/general.m4 (AT_SETUP): Prep AT_ingroup for fallback
        use in empty test.  Fixes regression introduced 2009-04-06.
diff --git a/lib/autotest/general.m4 b/lib/autotest/general.m4
index b00c79b..9c6538e 100644
--- a/lib/autotest/general.m4
+++ b/lib/autotest/general.m4
@@ -999,8 +999,7 @@ m4_divert_pop([PREPARE_TESTS])dnl
 m4_divert_push([TESTS])dnl

 # Create the master directory if it doesn't already exist.
-test -d "$at_suite_dir" ||
-  mkdir "$at_suite_dir" ||
+AS_MKDIR_P(["$at_suite_dir"]) ||
   AS_ERROR([cannot create `$at_suite_dir'])

 # Can we diff with `/dev/null'?  DU 5.0 refuses.
@@ -1094,11 +1093,15 @@ at_fn_group_prepare ()
   _AT_NORMALIZE_TEST_GROUP_NUMBER(at_group_normalized)

   # Create a fresh directory for the next test group, and enter.
+  # If one already exists, the user may have invoked ./run from
+  # within that directory; we remove the contents, but not the
+  # directory itself, so that we aren't pulling the rug out from
+  # under the shell's notion of the current directory.
   at_group_dir=$at_suite_dir/$at_group_normalized
   at_group_log=$at_group_dir/$as_me.log
   if test -d "$at_group_dir"; then
     find "$at_group_dir" -type d ! -perm -700 -exec chmod u+rwx \{\} \;
-    rm -fr "$at_group_dir" ||
+    rm -fr "$at_group_dir"/* "$at_group_dir"/.[!.] "$at_group_dir"/.??* ||
     AS_WARN([test directory for $at_group_normalized could not be cleaned.])
   fi
   # Be tolerant if the above `rm' was not able to remove the directory.
-- 
1.6.1.2

autotest vs. in-use directories

Reply via email to