On Thu, Dec 19, 2013 at 05:33:55PM +0100, Michael Haggerty wrote:

> > But we don't loop on ENOENT. So if the rmdir happens in the middle,
> > after the mkdir but before we call open again, we'd fail, because we
> > don't treat ENOENT specially in the second call to open. That is
> > unlikely to happen, though, as prune would not be removing a directory
> > it did not just enter and clean up an object from (in which case we
> > would not have gotten the first ENOENT in the creator). [...]
> The way I read it, prune tries to delete the directory whether or not
> there were any files in it.  So the race could be triggered by a single
> writer that wants to write an object to a not-yet-existent shard
> directory and a single prune process that encounters the directory
> between when it is created and when the object file is added.

Yes, that's true. It does make the race slightly more difficult than a
straight deletion because the prune has to catch it in the moment where
it exists but does not yet have an object. But it's still possible.

> But that doesn't mean I disagree with your conclusion:

I think we're in violent agreement at this point. :)

> Regarding references:
> > On a similar note, I imagine that a simultaneous "branch foo/bar" and
> > "branch -d foo/baz" could race over the creation/deletion of
> > "refs/heads/foo", but I didn't look into it.
> Deleting a loose reference doesn't cause the directory containing it to
> be deleted.  The directory is only deleted by pack-refs (and then only
> when a reference in the directory was just packed) or when there is an
> attempt to create a new reference that conflicts with the directory.  So
> the question is whether the creation of a loose ref file is robust
> against the disappearance of a directory that it just created.

Ah, right, I forgot we leave the directories sitting around after
deletion. So we may run into a collision with another creator, but by
definition we would have a D/F conflict with such a creator anyway, so
we cannot both succeed.

But we can hit the problem with pack-refs, as you note:

> And the answer is "no".  It looks like there are a bunch of places where
> similar races occur involving references.  And probably many others
> elsewhere in the code.  (Any caller of safe_create_leading_directories()
> is a candidate problem point, and in fact that function itself has an
> internal race.)  I've started fixing some of these but it might take a
> while.

Yeah, I think you'd have to teach safe_create_leading_directories to
atomically try-to-create-and-check-errno rather than stat+mkdir. And
then teach it to backtrack when an expected leading path goes missing
after we created it (so mkdir("foo"), then mkdir("foo/bar"), then step
back to mkdir("foo") if we got ENOENT).

I don't think the races are a big deal, though. As with the prune case,
we will ultimately fail to create the lockfile and get a temporary
failure rather than a corruption. So unless we actually have reports of
it happening (and I have seen none), it's probably not worth spending
much time on.

To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to