On 12/19/2013 01:04 AM, Jeff King wrote:
> On Wed, Dec 18, 2013 at 11:44:58PM +0100, Michael Haggerty wrote:
>> [While doing so, I got sidetracked by the question: what happens if a
>> prune process deletes the "objects/XX" directory just the same moment
>> that another process is trying to write an object into that directory?
>> I think the relevant function is sha1_file.c:create_tmpfile(). It looks
>> like there is a nonzero but very small race window that could result in
>> a spurious "unable to create temporary file" error, but even then I
>> don't think there would be any corruption or anything.]
> There's a race there, but I think it's hard to trigger.
> Our strategy with object creation is to call open, recognize ENOENT,
> mkdir, and then try again. If the rmdir happens before our call to open,
> then we're fine. If open happens first, then the rmdir will fail.
> But we don't loop on ENOENT. So if the rmdir happens in the middle,
> after the mkdir but before we call open again, we'd fail, because we
> don't treat ENOENT specially in the second call to open. That is
> unlikely to happen, though, as prune would not be removing a directory
> it did not just enter and clean up an object from (in which case we
> would not have gotten the first ENOENT in the creator). [...]
The way I read it, prune tries to delete the directory whether or not
there were any files in it. So the race could be triggered by a single
writer that wants to write an object to a not-yet-existent shard
directory and a single prune process that encounters the directory
between when it is created and when the object file is added.
But that doesn't mean I disagree with your conclusion:
> So it seems unlikely and the worst case is a temporary failure, not a
> corruption. It's probably not worth caring too much about, but we could
> solve it pretty easily by looping on ENOENT on creation.
> On a similar note, I imagine that a simultaneous "branch foo/bar" and
> "branch -d foo/baz" could race over the creation/deletion of
> "refs/heads/foo", but I didn't look into it.
Deleting a loose reference doesn't cause the directory containing it to
be deleted. The directory is only deleted by pack-refs (and then only
when a reference in the directory was just packed) or when there is an
attempt to create a new reference that conflicts with the directory. So
the question is whether the creation of a loose ref file is robust
against the disappearance of a directory that it just created.
And the answer is "no". It looks like there are a bunch of places where
similar races occur involving references. And probably many others
elsewhere in the code. (Any caller of safe_create_leading_directories()
is a candidate problem point, and in fact that function itself has an
internal race.) I've started fixing some of these but it might take a
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html