I discovered this on cygwin 1.7, which is attempting to add the ability to emulate deleting a directory even when a process owns a handle to that directory. Cygwin 1.5 flat out fails, as permitted by POSIX, because Windows does not permit deleting an open handle, and autotest already has code that gracefully deals with this failure. But the cygwin 1.7 emulation works by detecting failure to delete because of an open handle, and uses a fallback of renaming the affected handle to move it into the recycle bin; in most cases, this makes it appear that an in-use handle has been successfully unlinked - no new process can access the deleted contents, but existing processes can take as long as they need to hold on to the file, and the file finally disappears when the last handle closes. Unfortunately, with the current state of cygwin 1.7, the rename does not affect subsequent readdir, so a listing of the parent directory still shows the supposedly deleted subdir; also, it does not permit creating a new file or directory with the name of the old deleted file. As a result, several tests fail on cygwin 1.7 that used to pass on cygwin 1.5; all of them have the semantics similar to:
(cd micro-test.dir/1 && ./run) The problem is that the shell invoking ./run must stick around to wait for its return status, thus keeping a handle open on micro-test.dir. However, the autotest-generated run scripts proceed to cd back to the top testsuite directory, nuke the per-test directory, and repopulate it from scratch. Under cygwin 1.5, nuking the per-test directory fails (it is still in use by the shell waiting for ./run to complete), but we ignore that failure and are able to reuse the (now-empty) directory, effectively repopulating it from scratch anyways. But under cygwin 1.7, the nuke succeeds, but we are then unable to recreate the directory because of the Windows limitation mentioned above. Changing the testsuite to use 'exec ./run' instead of './run' solves the problem, because there is no longer a parent shell waiting for status from within that directory, and thus no longer any process keeping the directory handle in use across run's attempt to nuke and rebuild the per-test directory. But in thinking about the issue, it affects more than just cygwin. Consider a Unix system with an NFS mount (I tested on Solaris 8). There is no restriction against deleting a directory that is in use by another process, nor against recreating a new directory by the same name. But the process that had the directory ripped out from under it does NOT see the recreated directory of the same name: 1$ cd /tmp 1$ mkdir foo 1$ cd foo 1$ touch bar 1$ ls bar 2$ cd /tmp 2$ rm -Rf foo 2$ mkdir foo 2$ cd foo 2$ touch blah 2$ ls blah 1$ ls ls: reading directory .: Stale NFS file handle In other words, the fact that testsuite is trying to nuke the _entire_ per-test directory, rather than just its contents, means that any user who does 'cd testsuite.dir/nnn; ./run' has given their current shell a stale directory handle for $PWD. So I'm thinking about applying this patch. Rather than changing autoconf's testsuite to use 'exec ./run' (which does indeed make the failing tests once again pass for cygwin 1.7, but doesn't help the interactive user who won't want to end their session by using exec), I decided to fix autotest to quit trying to remove the entire directory, but instead only remove its contents. I believe I got the glob correct for deleting all hidden files but not '.' nor '..'. From: Eric Blake <[email protected]> Date: Thu, 9 Apr 2009 11:13:51 -0600 Subject: [PATCH] Avoid problems caused by deleting in-use directory. * lib/autotest/general.m4 (AT_INIT) <at_fn_group_prepare>: Only remove the contents of $at_group_dir, not the directory itself. Signed-off-by: Eric Blake <[email protected]> --- ChangeLog | 4 ++++ lib/autotest/general.m4 | 9 ++++++--- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/ChangeLog b/ChangeLog index 65c0250..fcbc835 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,9 @@ 2009-04-09 Eric Blake <[email protected]> + Avoid problems caused by deleting in-use directory. + * lib/autotest/general.m4 (AT_INIT) <at_fn_group_prepare>: Only + remove the contents of $at_group_dir, not the directory itself. + Fix regression in empty test. * lib/autotest/general.m4 (AT_SETUP): Prep AT_ingroup for fallback use in empty test. Fixes regression introduced 2009-04-06. diff --git a/lib/autotest/general.m4 b/lib/autotest/general.m4 index b00c79b..9c6538e 100644 --- a/lib/autotest/general.m4 +++ b/lib/autotest/general.m4 @@ -999,8 +999,7 @@ m4_divert_pop([PREPARE_TESTS])dnl m4_divert_push([TESTS])dnl # Create the master directory if it doesn't already exist. -test -d "$at_suite_dir" || - mkdir "$at_suite_dir" || +AS_MKDIR_P(["$at_suite_dir"]) || AS_ERROR([cannot create `$at_suite_dir']) # Can we diff with `/dev/null'? DU 5.0 refuses. @@ -1094,11 +1093,15 @@ at_fn_group_prepare () _AT_NORMALIZE_TEST_GROUP_NUMBER(at_group_normalized) # Create a fresh directory for the next test group, and enter. + # If one already exists, the user may have invoked ./run from + # within that directory; we remove the contents, but not the + # directory itself, so that we aren't pulling the rug out from + # under the shell's notion of the current directory. at_group_dir=$at_suite_dir/$at_group_normalized at_group_log=$at_group_dir/$as_me.log if test -d "$at_group_dir"; then find "$at_group_dir" -type d ! -perm -700 -exec chmod u+rwx \{\} \; - rm -fr "$at_group_dir" || + rm -fr "$at_group_dir"/* "$at_group_dir"/.[!.] "$at_group_dir"/.??* || AS_WARN([test directory for $at_group_normalized could not be cleaned.]) fi # Be tolerant if the above `rm' was not able to remove the directory. -- 1.6.1.2
