This is v2 of changes [1] to remove some mkdir/rmdir races around
safe_create_leading_directories() and a couple of its callers.  Thanks
to Jonathan Nieder for his thorough review of v1 and to Ramsay Jones
for pointing out a typo.  I think I have addressed all of their
suggestions.

This patch series has a bigger scope than v1.  The main differences is
that I took Jonathan's (implicit?) suggestion to move the retrying to
a level higher, where files are actually created in the directories
and therefore the directories are definitely no longer subject to
pruning.

Differences from v1:

* Improve some commit messages

* Break up some changes to safe_create_leading_directories() into
  smaller steps.

* Fix a problem in safe_create_leading_directories() when handling
  paths with multiple slashes (e.g., "foo//bar").  (I noticed this
  pre-existing problem while making the other changes.)

* Change how retrying is done:

  * safe_create_leading_directories() doesn't retry internally;
    instead, its return value is turned into an enum with a new value,
    SCLD_VANISHED, that indicates that a directory along the path
    vanished while it was working.  This return value is an indication
    that its caller might want to try calling the function again.

  * Change rename_ref() to retry if either
    safe_create_leading_directories() returns SCLD_VANISHED or if its
    own call to rename() returns ENOENT.

* Fix a similar mkdir/rmdir race in lock_ref_sha1_basic().  This one
  is actually more interesting than the one in rename_ref() because it
  can be triggered internal to git.

* Fix a race in remove_dir_recurse(): if somebody else deletes a file
  or directory that the function wanted to delete anyway, don't treat
  it as an error.

The main git-internal race that I know to be fixed by these changes is
when pack-refs is trying to delete empty directories at the same time
as another process is trying to create a new reference.  It can be
reproduced by this script:

    BRANCHES="foo/bar/xyzzy foo/bar/plugh"
    HEADS="h1 h2"

    rm -rf test-repo
    $GIT init test-repo
    cd test-repo
    $GIT config user.name 'Lou User'
    $GIT config user.email 'lu...@example.com'

    for h in $HEADS
    do
        $GIT commit --allow-empty -m $h
        $GIT branch $h
    done

    (
        while true
        do
        for b in $BRANCHES
        do
            for h in $HEADS
            do
                $GIT update-ref refs/heads/$b $h
            done
        done
        done
    ) &
    pid1=$!

    (
        while true
        do
        $GIT pack-refs --all
        done
    ) &
    pid2=$!

The script has to fail if update-ref and pack-refs try to lock the
same reference at the same time, because we don't handle lock
contention:

    fatal: Unable to create 
'/home/mhagger/tmp/test-repo/.git/refs/heads/foo/bar/plugh.lock': File exists.

    If no other git process is currently running, this probably means a
    git process crashed in this repository earlier. Make sure no other git
    process is running and remove the file manually to continue.

Also, if update-ref changes the value of a reference while pack-refs
is running, then pack-refs emits a warning and leaves the new value of
the loose reference in place:

    error: Ref refs/heads/foo/bar/xyzzy is at 
003a9cedc3ec79b8f589b158ffe91177f0a611b3 but expected 
34bf1e34d27a639732a01fef0a791e58c2c0882f

But it used to have other unnecessary failures, too, related to
mkdir/rmdir races between the two programs:

    error: unable to create directory for .git/refs/heads/foo/bar/plugh

and

    fatal: Unable to create 
'/home/mhagger/tmp/test-repo/.git/refs/heads/foo/bar/plugh.lock': No such file 
or directory

This patch series eliminates the last two types of failures.  Here are
tallies of failures from running the above script for 60 seconds
before and after this change.  Before:

    Total updates: 46900
    Unable to create '*.lock': File exists: 1992
    * is at * but expected *: 940
    unable to create directory: 187
    Unable to create '*.lock': No such file or directory: 197

After:

    Total updates: 47892
    Unable to create '*.lock': File exists: 2105
    * is at * but expected *: 835
    unable to create directory: 0
    Unable to create '*.lock': No such file or directory: 0

Clearly, more work is still needed in this area.  For example, the "*
is at * but expected *" errors are not really errors at all and should
not be reported as such.  And we might want to consider adding a short
delay and then a retry when acquiring locks.  If I run the same test
script with the following patch, then most or all of the "Unable to
create '*.lock': File exists" errors go away too.

======================================================================
diff --git a/refs.c b/refs.c
index 810f802..b3ff1f5 100644
--- a/refs.c
+++ b/refs.c
@@ -2117,7 +2117,10 @@ static struct ref_lock *lock_ref_sha1_basic(const char 
*refname,
                         * again:
                         */
                        goto retry;
-               else
+               else if (errno == EEXIST && --attempts > 0) {
+                       usleep(1000);
+                       goto retry;
+               } else
                        unable_to_lock_index_die(ref_file, errno);
        }
        return old_sha1 ? verify_lock(lock, old_sha1, mustexist) : lock;
======================================================================

I hope to do some more work here in the near future.

[1] Version 1: http://thread.gmane.org/gmane.comp.version-control.git/239638

Michael Haggerty (17):
  safe_create_leading_directories(): fix format of "if" chaining
  safe_create_leading_directories(): reduce scope of local variable
  safe_create_leading_directories(): add explicit "slash" pointer
  safe_create_leading_directories(): rename local variable
  safe_create_leading_directories(): split on first of multiple slashes
  safe_create_leading_directories(): always restore slash at end of loop
  safe_create_leading_directories(): introduce enum for return values
  cmd_init_db(): when creating directories, handle errors conservatively
  safe_create_leading_directories(): add new error value SCLD_VANISHED
  lock_ref_sha1_basic(): on SCLD_VANISHED, retry
  lock_ref_sha1_basic(): if locking fails with ENOENT, retry
  remove_dir_recurse(): tighten condition for removing unreadable dir
  remove_dir_recurse(): handle disappearing files and directories
  rename_ref(): extract function rename_tmp_log()
  rename_tmp_log(): handle a possible mkdir/rmdir race
  rename_tmp_log(): limit the number of remote_empty_directories()
    attempts
  rename_tmp_log(): on SCLD_VANISHED, retry

 builtin/init-db.c |  9 +++---
 cache.h           | 25 +++++++++++++--
 dir.c             | 27 +++++++++++-----
 merge-recursive.c |  2 +-
 refs.c            | 92 ++++++++++++++++++++++++++++++++++++++++---------------
 sha1_file.c       | 67 ++++++++++++++++++++++------------------
 6 files changed, 155 insertions(+), 67 deletions(-)

-- 
1.8.5.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to