On Tue, Oct 21, 2014 at 01:36:47PM -0700, Ronnie Sahlberg wrote:
> commit e193c10fc4f9274d1e751cfcdcc4507818e8d498 upstream.
>
> Change lock_ref_sha1_basic to return an error instead of dying when
> we fail to lock a file during a transaction.
> This function is only called from transaction_commit() and it knows how
> to handle these failures.
> [...]
> - else
> - unable_to_lock_die(ref_file, errno);
> + else {
> + struct strbuf err = STRBUF_INIT;
> + unable_to_lock_message(ref_file, errno, &err);
> + error("%s", err.buf);
> + strbuf_reset(&err);
> + goto error_return;
> + }
I coincidentally just wrote almost the identical patch, because this
isn't just a cleanup; it fixes a real bug. During pack_refs, we call
prune_ref to lock and delete the loose ref. If the lock fails, that's
OK; that just means somebody else is updating it at the same time, and
we can skip our pruning step. But due to the unable_to_lock_die call
here in lock_ref_sha1_basic, the pack-refs process may die prematurely.
I wonder if it is worth pulling this one out from the rest of the
series, as it has value (and can be applied) on its own. I did some
digging on the history of this, too. Here's the rationale I wrote:
lock_ref_sha1_basic: do not die on locking errors
lock_ref_sha1_basic is inconsistent about when it calls
die() and when it returns NULL to signal an error. This is
annoying to any callers that want to recover from a locking
error.
This seems to be mostly historical accident. It was added in
4bd18c4 (Improve abstraction of ref lock/write.,
2006-05-17), which returned an error in all cases except
calling safe_create_leading_directories, in which case it
died. Later, 40aaae8 (Better error message when we are
unable to lock the index file, 2006-08-12) asked
hold_lock_file_for_update to die for us, leaving the
resolve_ref code-path the only one which returned NULL.
We tried to correct that in 5cc3cef (lock_ref_sha1(): do not
sometimes error() and sometimes die()., 2006-09-30),
by converting all of the die() calls into returns. But we
missed the "die" flag passed to the lock code, leaving us
inconsistent. This state persisted until e5c223e
(lock_ref_sha1_basic(): if locking fails with ENOENT, retry,
2014-01-18). Because of its retry scheme, it does not ask
the lock code to die, but instead manually dies with
unable_to_lock_die().
We can make this consistent with the other return paths by
converting this to use unable_to_lock_message(), and
returning NULL. This is safe to do because all callers
already needed to check the return value of the function,
since it could fail (and return NULL) for other reasons.
I also have some other cleanups to lock_ref_sha1_basic's error handling.
I'd be happy to take over this patch and send it along with those
cleanups as a separate series.
-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html