Re: [PATCH] config: use a static lock_file struct
On Wed, Aug 30, 2017 at 12:55:55AM -0400, Jeff King wrote: > > I feel like right now we meet (1) and not (2). But I think if we keep to > > that lower bar of (1), it might not be that bad. We're assuming now that > > there's no race on the tempfile->active flag, for instance. We could > > probably make a similar assumption about putting items onto or taking > > them off of a linked list (it's not really atomic, but a single pointer > > assignment is probably "atomic enough" for our purposes). > > Something like this, which AFAICT is about as safe as the existing code > in its list manipulation. It retains the "active" flag as an additional > check which I think isn't strictly necessary, but potentially catches > some logic errors. The patch below demonstrates how this could be used to turn some "xcalloc" lock_files into stack variables that are allowed to go out of scope after we commit or rollback. This solves the three lock-related leaks reported by valgrind when running t. _But_ it also demonstrates an interesting downside of this approach. Some functions are lazy in their error paths. For instance, look at write_index_as_tree(). We take the lock early in the function, but may return before a commit or rollback if we hit an error. With the current code this "leaks" the tempfile, which is wrong. But nobody notices because in practice the program exits soon after and we clean up the tempfile then. But with a stack variable lock_file, that minor problem becomes a major one: our stack variable got added to a global linked list and the atexit() handler will try to read it. But now of course it will contain garbage, since the variable went out of scope. So it's probably safer to just let tempfile.c handle the whole lifetime, and have it put all live tempfiles on the heap, and free them when they're deactivated. That means our creation signature becomes more like: struct tempfile *create_tempfile(const char *path); and delete_tempfile() actually calls free() on it (though we'd probably want to skip the free() from a signal handler for the usual reasons). I don't have a patch for that, but just food for thought while exploring the implications of my earlier suggestion. --- diff --git a/builtin/update-index.c b/builtin/update-index.c index d955cd56b3..bf7420b808 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -917,7 +917,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) struct refresh_params refresh_args = {0, _errors}; int lock_error = 0; int split_index = -1; - struct lock_file *lock_file; + struct lock_file lock_file = LOCK_INIT; struct parse_opt_ctx_t ctx; strbuf_getline_fn getline_fn; int parseopt_state = PARSE_OPT_UNKNOWN; @@ -1016,11 +1016,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) git_config(git_default_config, NULL); - /* We can't free this memory, it becomes part of a linked list parsed atexit() */ - lock_file = xcalloc(1, sizeof(struct lock_file)); - /* we will diagnose later if it turns out that we need to update it */ - newfd = hold_locked_index(lock_file, 0); + newfd = hold_locked_index(_file, 0); if (newfd < 0) lock_error = errno; @@ -1155,11 +1152,11 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) exit(128); unable_to_lock_die(get_index_file(), lock_error); } - if (write_locked_index(_index, lock_file, COMMIT_LOCK)) + if (write_locked_index(_index, _file, COMMIT_LOCK)) die("Unable to write new index file"); } - rollback_lock_file(lock_file); + rollback_lock_file(_file); return has_errors ? 1 : 0; } diff --git a/cache-tree.c b/cache-tree.c index 2440d1dc89..440f92aeca 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -603,19 +603,15 @@ static struct cache_tree *cache_tree_find(struct cache_tree *it, const char *pat int write_index_as_tree(unsigned char *sha1, struct index_state *index_state, const char *index_path, int flags, const char *prefix) { int entries, was_valid, newfd; - struct lock_file *lock_file; + struct lock_file lock_file = LOCK_INIT; - /* -* We can't free this memory, it becomes part of a linked list -* parsed atexit() -*/ - lock_file = xcalloc(1, sizeof(struct lock_file)); - - newfd = hold_lock_file_for_update(lock_file, index_path, LOCK_DIE_ON_ERROR); + newfd = hold_lock_file_for_update(_file, index_path, LOCK_DIE_ON_ERROR); entries = read_index_from(index_state, index_path); - if (entries < 0) + if (entries < 0) { + rollback_lock_file(_file); return WRITE_TREE_UNREADABLE_INDEX; + } if (flags & WRITE_TREE_IGNORE_CACHE_TREE)
Re: [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1()
On Tue, Aug 29, 2017 at 05:48:09PM -0400, Jeff King wrote: > On Tue, Aug 29, 2017 at 09:22:08PM +0200, Martin Ågren wrote: > > > One take-away for me from this thread is that memory-leak-plugging seems > > to be appreciated, i.e., it's not just an academic problem. I think I'll > > look into it some more, i.e., slowly pursue option 2. :-) That might help > > give some insights on how to identify interesting leaks. > > Yes, I think if it's tractable that's the best path forward. It's not > clear to me how many of the things valgrind reports are false positives > (and in what way). Certainly the lock_file patch I just posted seems > like something worth addressing, even if we _tend_ to only leak a small > number of such structs per program invocation. It's not a problem now, > but it's a sign of a bad pattern. I wanted to see what it would take to get t to pass with "definite" leak-checking on, as in: diff --git a/t/valgrind/valgrind.sh b/t/valgrind/valgrind.sh index 669ebaf68b..5f01c06280 100755 --- a/t/valgrind/valgrind.sh +++ b/t/valgrind/valgrind.sh @@ -10,7 +10,7 @@ test-*) ;; esac -TOOL_OPTIONS='--leak-check=no' +TOOL_OPTIONS='--leak-check=full --errors-for-leak-kinds=definite' test -z "$GIT_VALGRIND_ENABLED" && exec "$program" "$@" The results gave an interesting overview of the problems I'd expect to face for the rest of the tests. I started with the config lock_file fix I sent earlier, but we needed a few more. This one in cmd_update_index() can become static in the same way: diff --git a/builtin/update-index.c b/builtin/update-index.c index d955cd56b3..3bff7de720 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -908,6 +908,7 @@ static int reupdate_callback(struct parse_opt_ctx_t *ctx, int cmd_update_index(int argc, const char **argv, const char *prefix) { + static struct lock_file lock_file; int newfd, entries, has_errors = 0, nul_term_line = 0; enum uc_mode untracked_cache = UC_UNSPECIFIED; int read_from_stdin = 0; @@ -917,7 +918,6 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) struct refresh_params refresh_args = {0, _errors}; int lock_error = 0; int split_index = -1; - struct lock_file *lock_file; struct parse_opt_ctx_t ctx; strbuf_getline_fn getline_fn; int parseopt_state = PARSE_OPT_UNKNOWN; @@ -1016,11 +1016,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) git_config(git_default_config, NULL); - /* We can't free this memory, it becomes part of a linked list parsed atexit() */ - lock_file = xcalloc(1, sizeof(struct lock_file)); - /* we will diagnose later if it turns out that we need to update it */ - newfd = hold_locked_index(lock_file, 0); + newfd = hold_locked_index(_file, 0); if (newfd < 0) lock_error = errno; @@ -1155,11 +1152,11 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) exit(128); unable_to_lock_die(get_index_file(), lock_error); } - if (write_locked_index(_index, lock_file, COMMIT_LOCK)) + if (write_locked_index(_index, _file, COMMIT_LOCK)) die("Unable to write new index file"); } - rollback_lock_file(lock_file); + rollback_lock_file(_file); return has_errors ? 1 : 0; } But there's another one in write_index_as_tree(), which is less obviously a candidate for turning into a single static variable. We might plausibly have multiple indices. I hacked around it with: diff --git a/cache-tree.c b/cache-tree.c index 2440d1dc89..f2a5aa138f 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -603,7 +603,7 @@ static struct cache_tree *cache_tree_find(struct cache_tree *it, const char *pat int write_index_as_tree(unsigned char *sha1, struct index_state *index_state, const char *index_path, int flags, const char *prefix) { int entries, was_valid, newfd; - struct lock_file *lock_file; + static struct lock_file *lock_file; /* * We can't free this memory, it becomes part of a linked list which means that if we call this function once per program we'll keep a single static pointer, and valgrind will consider it "reachable". But if we call multiple, they really will leak as before. So it's definitely a hack, though it was enough for t. I think I'd favor loosening the "you can never free locks" requirement instead, which makes this problem just go away (we could even declare it as a stack variable then). Next up is the fact that setup_git_env() happily overwrites its own pointers without freeing them if it's called twice. I expect this will be a reasonably common problem for things like config or option parsing where we might see multiple values. In the simple cases, we just need to make sure we do: free(foo); foo
Re: [PATCH] config: use a static lock_file struct
On Wed, Aug 30, 2017 at 12:31:47AM -0400, Jeff King wrote: > > It was surprisingly hard trying to get that code to do the right thing, > > non-racily, in every error path. Since `remove_tempfiles()` can be > > called any time (even from a signal handler), the linked list of > > `tempfile` objects has to be kept valid at all times but can't use > > mutexes. I didn't have the energy to keep going and make the lock > > objects freeable. > > > > I suppose the task could be made a bit easier by using `sigprocmask(2)` > > or `pthread_sigmask(3)` to make its running environment a little bit > > less hostile. > > I think there are really two levels of carefulness here: > > 1. Avoiding complicated things during a signal handler that may rely > on having a sane state from the rest of the program (e.g., > half-formed entries, stdio locks, etc). > > 2. Being truly race-free in the face of a signal arriving while we're > running arbitrary code that might have a tempfile struct in a funny > state. > > I feel like right now we meet (1) and not (2). But I think if we keep to > that lower bar of (1), it might not be that bad. We're assuming now that > there's no race on the tempfile->active flag, for instance. We could > probably make a similar assumption about putting items onto or taking > them off of a linked list (it's not really atomic, but a single pointer > assignment is probably "atomic enough" for our purposes). Something like this, which AFAICT is about as safe as the existing code in its list manipulation. It retains the "active" flag as an additional check which I think isn't strictly necessary, but potentially catches some logic errors. The strbuf_reset() calls become strbuf_release(), since we're promising the caller that they could now free the struct (or let it go out of scope) if they chose. Probably during a signal handler we should skip that (we know the struct is off the list and non-active at that point, but we could possibly hit libc's free() mutex). diff --git a/tempfile.c b/tempfile.c index 6843710670..a7d964ebf8 100644 --- a/tempfile.c +++ b/tempfile.c @@ -69,7 +69,6 @@ static void remove_tempfiles(int skip_fclose) tempfile_list->fp = NULL; delete_tempfile(tempfile_list); } - tempfile_list = tempfile_list->next; } } @@ -85,39 +84,54 @@ static void remove_tempfiles_on_signal(int signo) raise(signo); } -/* - * Initialize *tempfile if necessary and add it to tempfile_list. - */ -static void prepare_tempfile_object(struct tempfile *tempfile) +static void init_tempfile(struct tempfile *tempfile) +{ + tempfile->fd = -1; + tempfile->fp = NULL; + tempfile->active = 0; + tempfile->owner = 0; + tempfile->next = NULL; + strbuf_init(>filename, 0); +} + +static void activate_tempfile(struct tempfile *tempfile) { - if (!tempfile_list) { - /* One-time initialization */ + static volatile int initialized; + + if (!initialized) { sigchain_push_common(remove_tempfiles_on_signal); atexit(remove_tempfiles_on_exit); + initialized = 1; } if (tempfile->active) - die("BUG: prepare_tempfile_object called for active object"); - if (!tempfile->on_list) { - /* Initialize *tempfile and add it to tempfile_list: */ - tempfile->fd = -1; - tempfile->fp = NULL; - tempfile->active = 0; - tempfile->owner = 0; - strbuf_init(>filename, 0); - tempfile->next = tempfile_list; - tempfile_list = tempfile; - tempfile->on_list = 1; - } else if (tempfile->filename.len) { - /* This shouldn't happen, but better safe than sorry. */ - die("BUG: prepare_tempfile_object called for improperly-reset object"); + die("BUG: activate_tempfile called for active object"); + + tempfile->next = tempfile_list; + tempfile_list = tempfile; + tempfile->active = 1; +} + +static void deactivate_tempfile(struct tempfile *tempfile) +{ + struct tempfile *volatile *p; + + if (!tempfile->active) + return; + + tempfile->active = 0; + for (p = _list; *p; p = &(*p)->next) { + if (*p == tempfile) { + *p = tempfile->next; + break; + } } } /* Make sure errno contains a meaningful value on error */ int create_tempfile(struct tempfile *tempfile, const char *path) { - prepare_tempfile_object(tempfile); + init_tempfile(tempfile); strbuf_add_absolute_path(>filename, path); tempfile->fd = open(tempfile->filename.buf, @@ -127,11 +141,11 @@ int create_tempfile(struct tempfile *tempfile, const char *path) tempfile->fd =
Re: [PATCH] config: use a static lock_file struct
On Wed, Aug 30, 2017 at 6:31 AM, Jeff Kingwrote: > On Wed, Aug 30, 2017 at 05:25:18AM +0200, Michael Haggerty wrote: >> It was surprisingly hard trying to get that code to do the right thing, >> non-racily, in every error path. Since `remove_tempfiles()` can be >> called any time (even from a signal handler), the linked list of >> `tempfile` objects has to be kept valid at all times but can't use >> mutexes. I didn't have the energy to keep going and make the lock >> objects freeable. >> >> I suppose the task could be made a bit easier by using `sigprocmask(2)` >> or `pthread_sigmask(3)` to make its running environment a little bit >> less hostile. > > I think there are really two levels of carefulness here: > > 1. Avoiding complicated things during a signal handler that may rely > on having a sane state from the rest of the program (e.g., > half-formed entries, stdio locks, etc). > > 2. Being truly race-free in the face of a signal arriving while we're > running arbitrary code that might have a tempfile struct in a funny > state. > > I feel like right now we meet (1) and not (2). But I think if we keep to > that lower bar of (1), it might not be that bad. We're assuming now that > there's no race on the tempfile->active flag, for instance. We could > probably make a similar assumption about putting items onto or taking > them off of a linked list (it's not really atomic, but a single pointer > assignment is probably "atomic enough" for our purposes). > > Or I dunno. There's a lot of "volatile" modifiers sprinkled around. > Maybe those are enough to give us (2) as well (though in that case, I > think we'd still be as OK with the list manipulation as we are with the > active flag manipulation). I did in fact strive for both (1) and (2), though I know that there are a couple of short races remaining in the code (plus who knows how many bugs). Actually, I think you're right that a single "probably-atomic" pointer assignment would be enough to remove an entry from the list safely. I was thinking about what it would take to make the list thread-safe (which it currently is not, but probably doesn't need to be(?)). Modifying the linked list is not that difficult if you assume that there is only a single thread modifying the list at any given time. Michael
Re: [PATCH] config: use a static lock_file struct
On Wed, Aug 30, 2017 at 05:25:18AM +0200, Michael Haggerty wrote: > >>> In the long run we may want to drop the "tempfiles must remain forever" > >>> rule. This is certainly not the first time it has caused confusion or > >>> leaks. And I don't think it's a fundamental issue, just the way the code > >>> is written. But in the interim, this fix is probably worth doing. > > > > The main issue is that the caller needs to make sure they're removed > > from the list (via commit or rollback) before being freed. > > > > As far as I know anyway. This restriction dates back to the very early > > days of the lockfile code and has been carried through the various > > tempfile-cleanup refactorings over the years (mostly because each was > > afraid to make functional changes). > > > > +cc Michael, who did most comprehensive cleanup of that code. > > It was surprisingly hard trying to get that code to do the right thing, > non-racily, in every error path. Since `remove_tempfiles()` can be > called any time (even from a signal handler), the linked list of > `tempfile` objects has to be kept valid at all times but can't use > mutexes. I didn't have the energy to keep going and make the lock > objects freeable. > > I suppose the task could be made a bit easier by using `sigprocmask(2)` > or `pthread_sigmask(3)` to make its running environment a little bit > less hostile. I think there are really two levels of carefulness here: 1. Avoiding complicated things during a signal handler that may rely on having a sane state from the rest of the program (e.g., half-formed entries, stdio locks, etc). 2. Being truly race-free in the face of a signal arriving while we're running arbitrary code that might have a tempfile struct in a funny state. I feel like right now we meet (1) and not (2). But I think if we keep to that lower bar of (1), it might not be that bad. We're assuming now that there's no race on the tempfile->active flag, for instance. We could probably make a similar assumption about putting items onto or taking them off of a linked list (it's not really atomic, but a single pointer assignment is probably "atomic enough" for our purposes). Or I dunno. There's a lot of "volatile" modifiers sprinkled around. Maybe those are enough to give us (2) as well (though in that case, I think we'd still be as OK with the list manipulation as we are with the active flag manipulation). -Peff
Re: [PATCH] config: use a static lock_file struct
On 08/29/2017 09:12 PM, Jeff King wrote: > On Tue, Aug 29, 2017 at 12:09:28PM -0700, Brandon Williams wrote: > >>> -- >8 -- >>> Subject: [PATCH] config: use a static lock_file struct >>> >>> When modifying git config, we xcalloc() a struct lock_file >>> but never free it. This is necessary because the tempfile >>> code (upon which the locking code is built) requires that >>> the resulting struct remain valid through the life of the >>> program. However, it also confuses leak-checkers like >>> valgrind because only the inner "struct tempfile" is still >>> reachable; no pointer to the outer lock_file is kept. >> >> Is this just due to a limitation in the tempfile code? Would it be >> possible to improve the tempfile code such that we don't need to require >> that a tempfile, once created, is required to exist for the remaining >> life of the program? > > Yes. Like I wrote below: > >>> --- >>> In the long run we may want to drop the "tempfiles must remain forever" >>> rule. This is certainly not the first time it has caused confusion or >>> leaks. And I don't think it's a fundamental issue, just the way the code >>> is written. But in the interim, this fix is probably worth doing. > > The main issue is that the caller needs to make sure they're removed > from the list (via commit or rollback) before being freed. > > As far as I know anyway. This restriction dates back to the very early > days of the lockfile code and has been carried through the various > tempfile-cleanup refactorings over the years (mostly because each was > afraid to make functional changes). > > +cc Michael, who did most comprehensive cleanup of that code. It was surprisingly hard trying to get that code to do the right thing, non-racily, in every error path. Since `remove_tempfiles()` can be called any time (even from a signal handler), the linked list of `tempfile` objects has to be kept valid at all times but can't use mutexes. I didn't have the energy to keep going and make the lock objects freeable. I suppose the task could be made a bit easier by using `sigprocmask(2)` or `pthread_sigmask(3)` to make its running environment a little bit less hostile. Michael
Re: [RFC 0/7] transitioning to protocol v2
On Tue, Aug 29, 2017 at 04:08:25PM -0400, Jeff Hostetler wrote: > I just wanted to jump in here and say I've done some initial > testing of this against VSTS and so far it seems fine. And yes, > we have a custom git server. Great, thank you for checking. > VSTS doesn't support the "git://" protocol, so the double-null trick > isn't an issue for us. But "https://; worked just fine. I'm still > asking around internally whether we support passing SSH environment > variables. The key thing for ssh is not whether you support passing environment variables. It's whether you quietly ignore unknown variables rather than cutting off the connection. To support the v2 protocol you'd need to pass the new variables, but you'd also need to modify your server to actually do something useful with them anyway. At this point we're mostly concerned with whether we can safely pass the variables to current implementations unconditionally and get a reasonable outcome. To be honest, I'm not all that worried about VSTS either way. It's a centralized service which will likely get v2 extensions implemented in a timely manner (once they exist). I'm much more concerned about shrink-wrap implementations deployed in the wild, which may hang around without being upgraded for years. If v2-aware clients send requests that cause those old implementations to choke, users won't be happy. -Peff
Re: [PATCH 04/10] packed_delete_refs(): implement method
On 08/29/2017 08:07 PM, Brandon Williams wrote: > On 08/29, Michael Haggerty wrote: >> [...] >> diff --git a/refs/packed-backend.c b/refs/packed-backend.c >> index d19b3bfba5..83a088118f 100644 >> --- a/refs/packed-backend.c >> +++ b/refs/packed-backend.c >> @@ -1082,7 +1082,50 @@ static int packed_initial_transaction_commit(struct >> ref_store *ref_store, >> static int packed_delete_refs(struct ref_store *ref_store, const char *msg, >> struct string_list *refnames, unsigned int flags) >> { >> -die("BUG: not implemented yet"); >> +struct packed_ref_store *refs = >> +packed_downcast(ref_store, REF_STORE_WRITE, "delete_refs"); >> +struct strbuf err = STRBUF_INIT; >> +struct ref_transaction *transaction; >> +struct string_list_item *item; >> +int ret; >> + >> +(void)refs; /* We need the check above, but don't use the variable */ > > Can't say I've seen a line like this before, Is the intent to just mark > that this variable won't be used like you mention in the comment? The `(void)refs;` part tells the compiler not to complain about a variable that is defined but never used. So why define the variable in the first place? I could instead do (void)packed_downcast(ref_store, REF_STORE_WRITE, "delete_refs"); My reason (which people might disagree with) is that I like to keep this boilerplate consistent from method to method. To switch to the second variant, not only would the line look different, but it would also have to be moved down several lines, past the other variables' declarations, to avoid the dreaded "ISO C90 forbids mixed declarations and code". I'd prefer to be able to mix declarations and code, but the project's style prohibits it. Michael
Re: [PATCH v3 1/3] refs/files-backend: add longer-scoped copy of string to list
v3 looks good to me. Thanks! Reviewed-by: Michael HaggertyMichael
Re: [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1()
On Tue, Aug 29, 2017 at 09:22:08PM +0200, Martin Ågren wrote: > One take-away for me from this thread is that memory-leak-plugging seems > to be appreciated, i.e., it's not just an academic problem. I think I'll > look into it some more, i.e., slowly pursue option 2. :-) That might help > give some insights on how to identify interesting leaks. Yes, I think if it's tractable that's the best path forward. It's not clear to me how many of the things valgrind reports are false positives (and in what way). Certainly the lock_file patch I just posted seems like something worth addressing, even if we _tend_ to only leak a small number of such structs per program invocation. It's not a problem now, but it's a sign of a bad pattern. -Peff
Re: [RFC 0/7] transitioning to protocol v2
On 08/29, Jeff Hostetler wrote: > > > On 8/25/2017 1:35 PM, Jonathan Nieder wrote: > >Hi, > > > >Jeff King wrote: > >>On Thu, Aug 24, 2017 at 03:53:21PM -0700, Brandon Williams wrote: > > > >>>Another version of Git's wire protocol is a topic that has been discussed > >>>and > >>>attempted by many in the community over the years. The biggest challenge, > >>>as > >>>far as I understand, has been coming up with a transition plan to using > >>>the new > >>>server without breaking existing clients and servers. As such this RFC is > >>>really only concerned with solidifying a transition plan. Once it has been > >>>decided how we can transition to a new protocol we can get into decided > >>>what > >>>this new protocol would look like (though it would obviously eliminate the > >>>ref > >>>advertisement ;). > >> > > > >>I don't think libgit2 implements the server side. That leaves probably > >>JGit, Microsoft's VSTS (which I think is custom), and whatever Atlassian > >>and GitLab use. > > > >I'd be happy if someone tests the patches against those. :) > > I just wanted to jump in here and say I've done some initial > testing of this against VSTS and so far it seems fine. And yes, > we have a custom git server. > > VSTS doesn't support the "git://" protocol, so the double-null trick > isn't an issue for us. But "https://; worked just fine. I'm still > asking around internally whether we support passing SSH environment > variables. > > Jeff > Thanks for checking on this, I really appreciate it. Please let me know if anything I haven't thought of becomes an issue. I'm currently working on getting these patches into a more polished state to be used (as discussed elsewhere on this thread) as a precursor to an actual v2. -- Brandon Williams
Re: [RFC 0/7] transitioning to protocol v2
On 8/25/2017 1:35 PM, Jonathan Nieder wrote: Hi, Jeff King wrote: On Thu, Aug 24, 2017 at 03:53:21PM -0700, Brandon Williams wrote: Another version of Git's wire protocol is a topic that has been discussed and attempted by many in the community over the years. The biggest challenge, as far as I understand, has been coming up with a transition plan to using the new server without breaking existing clients and servers. As such this RFC is really only concerned with solidifying a transition plan. Once it has been decided how we can transition to a new protocol we can get into decided what this new protocol would look like (though it would obviously eliminate the ref advertisement ;). I don't think libgit2 implements the server side. That leaves probably JGit, Microsoft's VSTS (which I think is custom), and whatever Atlassian and GitLab use. I'd be happy if someone tests the patches against those. :) I just wanted to jump in here and say I've done some initial testing of this against VSTS and so far it seems fine. And yes, we have a custom git server. VSTS doesn't support the "git://" protocol, so the double-null trick isn't an issue for us. But "https://; worked just fine. I'm still asking around internally whether we support passing SSH environment variables. Jeff
Re: [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1()
On 29 August 2017 at 20:53, Jeff Kingwrote: > On Tue, Aug 29, 2017 at 06:51:52PM +0100, Lars Schneider wrote: >> What if we run a few selected tests with valgrind and count all files that >> valgrind mentions (a single leak has multiple file mentions because of >> the stack trace and other leak indicators). We record these counts and let >> TravisCI scream if one of the numbers increases. >> >> I wonder how stable/fragile such a metric would be as a simple refactoring >> could easily change these numbers. Below I ran valgrind on t5510 before and >> after Martin's patch. The diff below clearly shows the pkt-line leak. >> >> Would it make sense to pursue something like this in TravisCI to avoid >> "pkt-line" kind of leaks in the future? > > I don't think that would work, because simply adding new tests would > bump the leak count, without the code actually growing worse. > > But think about your strategy for a moment: what you're really trying to > do is say "these existing leaks are OK because they are too many for us > to count, but we want to make sure we don't add _new_ ones". We already > have two techniques for distinguishing old ones from new ones: > > 1. Mark existing ones with valgrind suppressions so they do not warn > at all. > > 2. Fix them, so that the "existing" count drops to zero. > > Option (2), of course, has the added side effect that it's actually > fixing potential problems. :) One idea would be to report that "t1234 used to complete without leaks, but now there is a leak". Of course, the leak could be triggered by some "irrelevant" setup step that was just added and not by the functionality that is actually tested. But maybe then the reaction might be "oh, let's fix this one leak so that this test is leak-free again". If that then also reduces the leakiness of some other tests, good. That might help against the "there are too many leaks, where should I start?"-effect. Or similarly "This stack-trace was reported. I haven't seen it before." It could be a refactoring, it could be an old leak that hasn't been triggered in that way before, or it could be a new leak. But to be honest, I think that only works -- in a psychological sense -- once the number of leaks is small enough, however much that is. One take-away for me from this thread is that memory-leak-plugging seems to be appreciated, i.e., it's not just an academic problem. I think I'll look into it some more, i.e., slowly pursue option 2. :-) That might help give some insights on how to identify interesting leaks. Martin
Re: [PATCH] config: use a static lock_file struct
On Tue, Aug 29, 2017 at 12:09:28PM -0700, Brandon Williams wrote: > > -- >8 -- > > Subject: [PATCH] config: use a static lock_file struct > > > > When modifying git config, we xcalloc() a struct lock_file > > but never free it. This is necessary because the tempfile > > code (upon which the locking code is built) requires that > > the resulting struct remain valid through the life of the > > program. However, it also confuses leak-checkers like > > valgrind because only the inner "struct tempfile" is still > > reachable; no pointer to the outer lock_file is kept. > > Is this just due to a limitation in the tempfile code? Would it be > possible to improve the tempfile code such that we don't need to require > that a tempfile, once created, is required to exist for the remaining > life of the program? Yes. Like I wrote below: > > --- > > In the long run we may want to drop the "tempfiles must remain forever" > > rule. This is certainly not the first time it has caused confusion or > > leaks. And I don't think it's a fundamental issue, just the way the code > > is written. But in the interim, this fix is probably worth doing. The main issue is that the caller needs to make sure they're removed from the list (via commit or rollback) before being freed. As far as I know anyway. This restriction dates back to the very early days of the lockfile code and has been carried through the various tempfile-cleanup refactorings over the years (mostly because each was afraid to make functional changes). +cc Michael, who did most comprehensive cleanup of that code. -Peff
Re: [PATCH] config: use a static lock_file struct
On 08/29, Brandon Williams wrote: > On 08/29, Jeff King wrote: > > On Tue, Aug 29, 2017 at 02:53:41PM -0400, Jeff King wrote: > > > > > It looks like the config code has a minor-ish leak. Patch to follow. > > > > Here it is. > > > > -- >8 -- > > Subject: [PATCH] config: use a static lock_file struct > > > > When modifying git config, we xcalloc() a struct lock_file > > but never free it. This is necessary because the tempfile > > code (upon which the locking code is built) requires that > > the resulting struct remain valid through the life of the > > program. However, it also confuses leak-checkers like > > valgrind because only the inner "struct tempfile" is still > > reachable; no pointer to the outer lock_file is kept. > > Is this just due to a limitation in the tempfile code? Would it be > possible to improve the tempfile code such that we don't need to require > that a tempfile, once created, is required to exist for the remaining > life of the program? > > > > > Other code paths solve this by using a single static lock > > struct. We can do the same here, because we know that we'll > > only lock and modify one config file at a time (and > > assertions within the lockfile code will ensure that this > > remains the case). > > > > That removes a real leak (when we fail to free the struct > > after locking fails) as well as removes the valgrind false > > positive. It also means that doing N sequential > > config-writes will use a constant amount of memory, rather > > than leaving stale lock_files for each. > > > > Note that since "lock" is no longer a pointer, it can't be > > NULL anymore. But that's OK. We used that feature only to > > avoid calling rollback_lock_file() on an already-committed > > lock. Since the lockfile code keeps its own "active" flag, > > it's a noop to rollback an inactive lock, and we don't have > > to worry about this ourselves. > > > > Signed-off-by: Jeff King> > --- > > In the long run we may want to drop the "tempfiles must remain forever" > > rule. This is certainly not the first time it has caused confusion or > > leaks. And I don't think it's a fundamental issue, just the way the code > > is written. But in the interim, this fix is probably worth doing. I didn't read far enough apparently :) I took a look at this once and found that the in order to do this we would just need to be careful in modifying a list of tempfiles. > > > > config.c | 24 +++- > > 1 file changed, 7 insertions(+), 17 deletions(-) > > > > diff --git a/config.c b/config.c > > index d0d8ce823a..1603f96e40 100644 > > --- a/config.c > > +++ b/config.c > > @@ -2450,7 +2450,7 @@ int git_config_set_multivar_in_file_gently(const char > > *config_filename, > > { > > int fd = -1, in_fd = -1; > > int ret; > > - struct lock_file *lock = NULL; > > + static struct lock_file lock; > > char *filename_buf = NULL; > > char *contents = NULL; > > size_t contents_sz; > > @@ -2469,8 +2469,7 @@ int git_config_set_multivar_in_file_gently(const char > > *config_filename, > > * The lock serves a purpose in addition to locking: the new > > * contents of .git/config will be written into it. > > */ > > - lock = xcalloc(1, sizeof(struct lock_file)); > > - fd = hold_lock_file_for_update(lock, config_filename, 0); > > + fd = hold_lock_file_for_update(, config_filename, 0); > > if (fd < 0) { > > error_errno("could not lock config file %s", config_filename); > > free(store.key); > > @@ -2583,8 +2582,8 @@ int git_config_set_multivar_in_file_gently(const char > > *config_filename, > > close(in_fd); > > in_fd = -1; > > > > - if (chmod(get_lock_file_path(lock), st.st_mode & 0) < 0) { > > - error_errno("chmod on %s failed", > > get_lock_file_path(lock)); > > + if (chmod(get_lock_file_path(), st.st_mode & 0) < 0) { > > + error_errno("chmod on %s failed", > > get_lock_file_path()); > > ret = CONFIG_NO_WRITE; > > goto out_free; > > } > > @@ -2639,28 +2638,19 @@ int git_config_set_multivar_in_file_gently(const > > char *config_filename, > > contents = NULL; > > } > > > > - if (commit_lock_file(lock) < 0) { > > + if (commit_lock_file() < 0) { > > error_errno("could not write config file %s", config_filename); > > ret = CONFIG_NO_WRITE; > > - lock = NULL; > > goto out_free; > > } > > > > - /* > > -* lock is committed, so don't try to roll it back below. > > -* NOTE: Since lockfile.c keeps a linked list of all created > > -* lock_file structures, it isn't safe to free(lock). It's > > -* better to just leave it hanging around. > > -*/ > > - lock = NULL; > > ret = 0; > > > > /* Invalidate the config cache */ > > git_config_clear(); > > > > out_free: > > - if
Re: [PATCH] config: use a static lock_file struct
On 08/29, Jeff King wrote: > On Tue, Aug 29, 2017 at 02:53:41PM -0400, Jeff King wrote: > > > It looks like the config code has a minor-ish leak. Patch to follow. > > Here it is. > > -- >8 -- > Subject: [PATCH] config: use a static lock_file struct > > When modifying git config, we xcalloc() a struct lock_file > but never free it. This is necessary because the tempfile > code (upon which the locking code is built) requires that > the resulting struct remain valid through the life of the > program. However, it also confuses leak-checkers like > valgrind because only the inner "struct tempfile" is still > reachable; no pointer to the outer lock_file is kept. Is this just due to a limitation in the tempfile code? Would it be possible to improve the tempfile code such that we don't need to require that a tempfile, once created, is required to exist for the remaining life of the program? > > Other code paths solve this by using a single static lock > struct. We can do the same here, because we know that we'll > only lock and modify one config file at a time (and > assertions within the lockfile code will ensure that this > remains the case). > > That removes a real leak (when we fail to free the struct > after locking fails) as well as removes the valgrind false > positive. It also means that doing N sequential > config-writes will use a constant amount of memory, rather > than leaving stale lock_files for each. > > Note that since "lock" is no longer a pointer, it can't be > NULL anymore. But that's OK. We used that feature only to > avoid calling rollback_lock_file() on an already-committed > lock. Since the lockfile code keeps its own "active" flag, > it's a noop to rollback an inactive lock, and we don't have > to worry about this ourselves. > > Signed-off-by: Jeff King> --- > In the long run we may want to drop the "tempfiles must remain forever" > rule. This is certainly not the first time it has caused confusion or > leaks. And I don't think it's a fundamental issue, just the way the code > is written. But in the interim, this fix is probably worth doing. > > config.c | 24 +++- > 1 file changed, 7 insertions(+), 17 deletions(-) > > diff --git a/config.c b/config.c > index d0d8ce823a..1603f96e40 100644 > --- a/config.c > +++ b/config.c > @@ -2450,7 +2450,7 @@ int git_config_set_multivar_in_file_gently(const char > *config_filename, > { > int fd = -1, in_fd = -1; > int ret; > - struct lock_file *lock = NULL; > + static struct lock_file lock; > char *filename_buf = NULL; > char *contents = NULL; > size_t contents_sz; > @@ -2469,8 +2469,7 @@ int git_config_set_multivar_in_file_gently(const char > *config_filename, >* The lock serves a purpose in addition to locking: the new >* contents of .git/config will be written into it. >*/ > - lock = xcalloc(1, sizeof(struct lock_file)); > - fd = hold_lock_file_for_update(lock, config_filename, 0); > + fd = hold_lock_file_for_update(, config_filename, 0); > if (fd < 0) { > error_errno("could not lock config file %s", config_filename); > free(store.key); > @@ -2583,8 +2582,8 @@ int git_config_set_multivar_in_file_gently(const char > *config_filename, > close(in_fd); > in_fd = -1; > > - if (chmod(get_lock_file_path(lock), st.st_mode & 0) < 0) { > - error_errno("chmod on %s failed", > get_lock_file_path(lock)); > + if (chmod(get_lock_file_path(), st.st_mode & 0) < 0) { > + error_errno("chmod on %s failed", > get_lock_file_path()); > ret = CONFIG_NO_WRITE; > goto out_free; > } > @@ -2639,28 +2638,19 @@ int git_config_set_multivar_in_file_gently(const char > *config_filename, > contents = NULL; > } > > - if (commit_lock_file(lock) < 0) { > + if (commit_lock_file() < 0) { > error_errno("could not write config file %s", config_filename); > ret = CONFIG_NO_WRITE; > - lock = NULL; > goto out_free; > } > > - /* > - * lock is committed, so don't try to roll it back below. > - * NOTE: Since lockfile.c keeps a linked list of all created > - * lock_file structures, it isn't safe to free(lock). It's > - * better to just leave it hanging around. > - */ > - lock = NULL; > ret = 0; > > /* Invalidate the config cache */ > git_config_clear(); > > out_free: > - if (lock) > - rollback_lock_file(lock); > + rollback_lock_file(); > free(filename_buf); > if (contents) > munmap(contents, contents_sz); > @@ -2669,7 +2659,7 @@ int git_config_set_multivar_in_file_gently(const char > *config_filename, > return ret; > > write_err_out: > - ret =
[PATCH] config: use a static lock_file struct
On Tue, Aug 29, 2017 at 02:53:41PM -0400, Jeff King wrote: > It looks like the config code has a minor-ish leak. Patch to follow. Here it is. -- >8 -- Subject: [PATCH] config: use a static lock_file struct When modifying git config, we xcalloc() a struct lock_file but never free it. This is necessary because the tempfile code (upon which the locking code is built) requires that the resulting struct remain valid through the life of the program. However, it also confuses leak-checkers like valgrind because only the inner "struct tempfile" is still reachable; no pointer to the outer lock_file is kept. Other code paths solve this by using a single static lock struct. We can do the same here, because we know that we'll only lock and modify one config file at a time (and assertions within the lockfile code will ensure that this remains the case). That removes a real leak (when we fail to free the struct after locking fails) as well as removes the valgrind false positive. It also means that doing N sequential config-writes will use a constant amount of memory, rather than leaving stale lock_files for each. Note that since "lock" is no longer a pointer, it can't be NULL anymore. But that's OK. We used that feature only to avoid calling rollback_lock_file() on an already-committed lock. Since the lockfile code keeps its own "active" flag, it's a noop to rollback an inactive lock, and we don't have to worry about this ourselves. Signed-off-by: Jeff King--- In the long run we may want to drop the "tempfiles must remain forever" rule. This is certainly not the first time it has caused confusion or leaks. And I don't think it's a fundamental issue, just the way the code is written. But in the interim, this fix is probably worth doing. config.c | 24 +++- 1 file changed, 7 insertions(+), 17 deletions(-) diff --git a/config.c b/config.c index d0d8ce823a..1603f96e40 100644 --- a/config.c +++ b/config.c @@ -2450,7 +2450,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename, { int fd = -1, in_fd = -1; int ret; - struct lock_file *lock = NULL; + static struct lock_file lock; char *filename_buf = NULL; char *contents = NULL; size_t contents_sz; @@ -2469,8 +2469,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename, * The lock serves a purpose in addition to locking: the new * contents of .git/config will be written into it. */ - lock = xcalloc(1, sizeof(struct lock_file)); - fd = hold_lock_file_for_update(lock, config_filename, 0); + fd = hold_lock_file_for_update(, config_filename, 0); if (fd < 0) { error_errno("could not lock config file %s", config_filename); free(store.key); @@ -2583,8 +2582,8 @@ int git_config_set_multivar_in_file_gently(const char *config_filename, close(in_fd); in_fd = -1; - if (chmod(get_lock_file_path(lock), st.st_mode & 0) < 0) { - error_errno("chmod on %s failed", get_lock_file_path(lock)); + if (chmod(get_lock_file_path(), st.st_mode & 0) < 0) { + error_errno("chmod on %s failed", get_lock_file_path()); ret = CONFIG_NO_WRITE; goto out_free; } @@ -2639,28 +2638,19 @@ int git_config_set_multivar_in_file_gently(const char *config_filename, contents = NULL; } - if (commit_lock_file(lock) < 0) { + if (commit_lock_file() < 0) { error_errno("could not write config file %s", config_filename); ret = CONFIG_NO_WRITE; - lock = NULL; goto out_free; } - /* -* lock is committed, so don't try to roll it back below. -* NOTE: Since lockfile.c keeps a linked list of all created -* lock_file structures, it isn't safe to free(lock). It's -* better to just leave it hanging around. -*/ - lock = NULL; ret = 0; /* Invalidate the config cache */ git_config_clear(); out_free: - if (lock) - rollback_lock_file(lock); + rollback_lock_file(); free(filename_buf); if (contents) munmap(contents, contents_sz); @@ -2669,7 +2659,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename, return ret; write_err_out: - ret = write_error(get_lock_file_path(lock)); + ret = write_error(get_lock_file_path()); goto out_free; } -- 2.14.1.721.gc5bc1565f1
Re: [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1()
On Tue, Aug 29, 2017 at 06:51:52PM +0100, Lars Schneider wrote: > I set $TOOL_OPTIONS in valgrind.sh: to > '--leak-check=full --errors-for-leak-kinds=definite' > > ... but I also had to adjust t/test-lib-functions.sh:test_create_repo > as I ran into the error "cannot run git init -- have you built things yet?". Yeah, this is a general problem with run-time analyzers. If we're not mostly error free than the setup code often breaks before you even get to the interesting part of the test. It looks like the config code has a minor-ish leak. Patch to follow. > What if we run a few selected tests with valgrind and count all files that > valgrind mentions (a single leak has multiple file mentions because of > the stack trace and other leak indicators). We record these counts and let > TravisCI scream if one of the numbers increases. > > I wonder how stable/fragile such a metric would be as a simple refactoring > could easily change these numbers. Below I ran valgrind on t5510 before and > after Martin's patch. The diff below clearly shows the pkt-line leak. > > Would it make sense to pursue something like this in TravisCI to avoid > "pkt-line" kind of leaks in the future? I don't think that would work, because simply adding new tests would bump the leak count, without the code actually growing worse. But think about your strategy for a moment: what you're really trying to do is say "these existing leaks are OK because they are too many for us to count, but we want to make sure we don't add _new_ ones". We already have two techniques for distinguishing old ones from new ones: 1. Mark existing ones with valgrind suppressions so they do not warn at all. 2. Fix them, so that the "existing" count drops to zero. Option (2), of course, has the added side effect that it's actually fixing potential problems. :) -Peff
Re: [PATCH 00/10] Implement transactions for the packed ref store
On 08/29, Michael Haggerty wrote: > This should be the second-to-last patch series in the quest for > mmapped packed-refs. > > Before this patch series, there is still more coupling than necessary > between `files_ref_store` and `packed_ref_store`: > > * `files_ref_store` adds packed references (e.g., during `git clone` > and `git pack-refs`) by inserting them into the `packed_ref_store`'s > internal `ref_cache`, then calling `commit_packed_refs()`. But > `files_ref_store` shouldn't even *know* that `packed_ref_store` uses > a `ref_cache`, let alone muck about in it. > > * `files_ref_store` deletes packed references by calling > `repack_without_refs()`. > > Instead, implement reference transactions and `delete_refs()` for > `packed_ref_store`, and change `files_ref_store` to use these standard > methods rather than internal `packed_ref_store` methods. > > This patch series finishes up the encapsulation of `packed_ref_store`. > At the end of the series, the outside world only interacts with it via > the refs API plus a couple of locking-related functions. That will > make it easy for the next patch series to change the internal workings > of `packed_ref_store` without affecting `files_ref_store`. > > Along the way, we fix some longstanding problems with reference > updates. Quoting from patch [08/10]: > > First, the old code didn't obtain the `packed-refs` lock until > `files_transaction_finish()`. This means that a failure to acquire the > `packed-refs` lock (e.g., due to contention with another process) > wasn't detected until it was too late (problems like this are supposed > to be detected in the "prepare" phase). The new code acquires the > `packed-refs` lock in `files_transaction_prepare()`, the same stage of > the processing when the loose reference locks are being acquired, > removing another reason why the "prepare" phase might succeed and the > "finish" phase might nevertheless fail. > > Second, the old code deleted the loose version of a reference before > deleting any packed version of the same reference. This left a moment > when another process might think that the packed version of the > reference is current, which is incorrect. (Even worse, the packed > version of the reference can be arbitrarily old, and might even point > at an object that has since been garbage-collected.) > > Third, if a reference deletion fails to acquire the `packed-refs` lock > altogether, then the old code might leave the repository in the > incorrect state (possibly corrupt) described in the previous > paragraph. > > Now we activate the new "packed-refs" file (sans any references that > are being deleted) *before* deleting the corresponding loose > references. But we hold the "packed-refs" lock until after the loose > references have been finalized, thus preventing a simultaneous > "pack-refs" process from packing the loose version of the reference in > the time gap, which would otherwise defeat our attempt to delete it. > > This patch series is also available as branch > `packed-ref-transactions` in my fork on GitHub [1]. A draft of the > patch series to change `packed_ref_store` to use mmap and get rid of > its `ref_cache` is available as branch `mmap-packed-refs` from the > same source. Overall the patches look sane to me, though I don't believe I'm qualified in this area to give you a complete thumbs up since I don't understand the refs code super well yet. I do like reading patch from you as you do a great job of laying out what you are doing in code, comments and commit messages, something I'm trying to get better at :) -- Brandon Williams
Re: [PATCH 09/10] packed-backend: rip out some now-unused code
On 08/29, Michael Haggerty wrote: > Now the outside world interacts with the packed ref store only via the > generic refs API plus a few lock-related functions. This allows us to > delete some functions that are no longer used, thereby completing the > encapsulation of the packed ref store. Love seeing patches which only remove old code :) > > Signed-off-by: Michael Haggerty> --- > refs/packed-backend.c | 193 > -- > refs/packed-backend.h | 8 --- > 2 files changed, 201 deletions(-) > > diff --git a/refs/packed-backend.c b/refs/packed-backend.c > index 83a088118f..90f44c1fbb 100644 > --- a/refs/packed-backend.c > +++ b/refs/packed-backend.c > @@ -91,19 +91,6 @@ struct ref_store *packed_ref_store_create(const char *path, > return ref_store; > } > > -/* > - * Die if refs is not the main ref store. caller is used in any > - * necessary error messages. > - */ > -static void packed_assert_main_repository(struct packed_ref_store *refs, > - const char *caller) > -{ > - if (refs->store_flags & REF_STORE_MAIN) > - return; > - > - die("BUG: operation %s only allowed for main ref store", caller); > -} > - > /* > * Downcast `ref_store` to `packed_ref_store`. Die if `ref_store` is > * not a `packed_ref_store`. Also die if `packed_ref_store` doesn't > @@ -321,40 +308,6 @@ static struct ref_dir *get_packed_refs(struct > packed_ref_store *refs) > return get_packed_ref_dir(get_packed_ref_cache(refs)); > } > > -/* > - * Add or overwrite a reference in the in-memory packed reference > - * cache. This may only be called while the packed-refs file is locked > - * (see packed_refs_lock()). To actually write the packed-refs file, > - * call commit_packed_refs(). > - */ > -void add_packed_ref(struct ref_store *ref_store, > - const char *refname, const struct object_id *oid) > -{ > - struct packed_ref_store *refs = > - packed_downcast(ref_store, REF_STORE_WRITE, > - "add_packed_ref"); > - struct ref_dir *packed_refs; > - struct ref_entry *packed_entry; > - > - if (!is_lock_file_locked(>lock)) > - die("BUG: packed refs not locked"); > - > - if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) > - die("Reference has invalid format: '%s'", refname); > - > - packed_refs = get_packed_refs(refs); > - packed_entry = find_ref_entry(packed_refs, refname); > - if (packed_entry) { > - /* Overwrite the existing entry: */ > - oidcpy(_entry->u.value.oid, oid); > - packed_entry->flag = REF_ISPACKED; > - oidclr(_entry->u.value.peeled); > - } else { > - packed_entry = create_ref_entry(refname, oid, REF_ISPACKED); > - add_ref_entry(packed_refs, packed_entry); > - } > -} > - > /* > * Return the ref_entry for the given refname from the packed > * references. If it does not exist, return NULL. > @@ -592,152 +545,6 @@ int packed_refs_is_locked(struct ref_store *ref_store) > static const char PACKED_REFS_HEADER[] = > "# pack-refs with: peeled fully-peeled \n"; > > -/* > - * Write the current version of the packed refs cache from memory to > - * disk. The packed-refs file must already be locked for writing (see > - * packed_refs_lock()). Return zero on success. On errors, rollback > - * the lockfile, write an error message to `err`, and return a nonzero > - * value. > - */ > -int commit_packed_refs(struct ref_store *ref_store, struct strbuf *err) > -{ > - struct packed_ref_store *refs = > - packed_downcast(ref_store, REF_STORE_WRITE | REF_STORE_MAIN, > - "commit_packed_refs"); > - struct packed_ref_cache *packed_ref_cache = > - get_packed_ref_cache(refs); > - int ok; > - int ret = -1; > - struct strbuf sb = STRBUF_INIT; > - FILE *out; > - struct ref_iterator *iter; > - char *packed_refs_path; > - > - if (!is_lock_file_locked(>lock)) > - die("BUG: commit_packed_refs() called when unlocked"); > - > - /* > - * If packed-refs is a symlink, we want to overwrite the > - * symlinked-to file, not the symlink itself. Also, put the > - * staging file next to it: > - */ > - packed_refs_path = get_locked_file_path(>lock); > - strbuf_addf(, "%s.new", packed_refs_path); > - if (create_tempfile(>tempfile, sb.buf) < 0) { > - strbuf_addf(err, "unable to create file %s: %s", > - sb.buf, strerror(errno)); > - strbuf_release(); > - goto out; > - } > - strbuf_release(); > - > - out = fdopen_tempfile(>tempfile, "w"); > - if (!out) { > - strbuf_addf(err, "unable to fdopen packed-refs tempfile: %s", > - strerror(errno)); > - goto error; > - } > - > -
Re: [PATCH 04/10] packed_delete_refs(): implement method
On 08/29, Michael Haggerty wrote: > Implement `packed_delete_refs()` using a reference transaction. This > means that `files_delete_refs()` can use `refs_delete_refs()` instead > of `repack_without_refs()` to delete any packed references, decreasing > the coupling between the classes. > > Signed-off-by: Michael Haggerty> --- > refs/files-backend.c | 2 +- > refs/packed-backend.c | 45 - > 2 files changed, 45 insertions(+), 2 deletions(-) > > diff --git a/refs/files-backend.c b/refs/files-backend.c > index fccbc24ac4..2c78f63494 100644 > --- a/refs/files-backend.c > +++ b/refs/files-backend.c > @@ -1157,7 +1157,7 @@ static int files_delete_refs(struct ref_store > *ref_store, const char *msg, > if (packed_refs_lock(refs->packed_ref_store, 0, )) > goto error; > > - if (repack_without_refs(refs->packed_ref_store, refnames, )) { > + if (refs_delete_refs(refs->packed_ref_store, msg, refnames, flags)) { > packed_refs_unlock(refs->packed_ref_store); > goto error; > } > diff --git a/refs/packed-backend.c b/refs/packed-backend.c > index d19b3bfba5..83a088118f 100644 > --- a/refs/packed-backend.c > +++ b/refs/packed-backend.c > @@ -1082,7 +1082,50 @@ static int packed_initial_transaction_commit(struct > ref_store *ref_store, > static int packed_delete_refs(struct ref_store *ref_store, const char *msg, >struct string_list *refnames, unsigned int flags) > { > - die("BUG: not implemented yet"); > + struct packed_ref_store *refs = > + packed_downcast(ref_store, REF_STORE_WRITE, "delete_refs"); > + struct strbuf err = STRBUF_INIT; > + struct ref_transaction *transaction; > + struct string_list_item *item; > + int ret; > + > + (void)refs; /* We need the check above, but don't use the variable */ Can't say I've seen a line like this before, Is the intent to just mark that this variable won't be used like you mention in the comment? > + > + if (!refnames->nr) > + return 0; > + > + /* > + * Since we don't check the references' old_oids, the > + * individual updates can't fail, so we can pack all of the > + * updates into a single transaction. > + */ > + > + transaction = ref_store_transaction_begin(ref_store, ); > + if (!transaction) > + return -1; > + > + for_each_string_list_item(item, refnames) { > + if (ref_transaction_delete(transaction, item->string, NULL, > +flags, msg, )) { > + warning(_("could not delete reference %s: %s"), > + item->string, err.buf); > + strbuf_reset(); > + } > + } > + > + ret = ref_transaction_commit(transaction, ); > + > + if (ret) { > + if (refnames->nr == 1) > + error(_("could not delete reference %s: %s"), > + refnames->items[0].string, err.buf); > + else > + error(_("could not delete references: %s"), err.buf); > + } > + > + ref_transaction_free(transaction); > + strbuf_release(); > + return ret; > } > > static int packed_pack_refs(struct ref_store *ref_store, unsigned int flags) > -- > 2.14.1 > -- Brandon Williams
Re: [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1()
> On 28 Aug 2017, at 06:11, Martin Ågrenwrote: > > On 28 August 2017 at 01:23, Jeff King wrote: >> On Sun, Aug 27, 2017 at 10:04:55PM +0200, Lars Schneider wrote: >> >>> I did run all tests under valgrind using "make valgrind" and I found >>> the following files with potential issues: >>> >>> cat valgrind.out | perl -nE 'say /^==.+\((.+)\:.+\)$/' | sort | uniq -c >>> 7102 >>> 2 clone.c >>> 33 common-main.c >>> 6 connect.c >>> 64 git.c >>> 4 ls-remote.c >>> 126 run-command.c >>> 12 transport.c >>> 7 worktree.c >> >> I'm not sure where valgrind.out comes from. The individual >> test-results/*.out files may have valgrind output, but I don't think >> they usually contain leak output. >> >> Doing "valgrind ./git-upload-pack . /dev/null" mentions >> leaked memory but not the locations. Adding --leak-check=full shows that >> most of it comes from format_packet(). >> >> And applying Martin's patch drops the "definitely lost" category down to >> 0 bytes (there's still 550k in "still reachable", but those are in the >> "exit will free them for us" category). >> >>> No mention of "pkt-line.c". Did you run Git with valgrind on one of >>> your repositories to find it? >> >> I'm curious, too. I don't think the valgrind setup in our test suite is >> great for finding leaks right now. > > Sorry for being brief. I've patched t/valgrind/valgrind.sh to say > "--leak-check=yes". Then I run "./t --valgrind", simply because > running the complete suite gives more reports than I could possibly > process. I set $TOOL_OPTIONS in valgrind.sh: to '--leak-check=full --errors-for-leak-kinds=definite' ... but I also had to adjust t/test-lib-functions.sh:test_create_repo as I ran into the error "cannot run git init -- have you built things yet?". With these changes I was able to see the leak running valgrind with t5110. --- What if we run a few selected tests with valgrind and count all files that valgrind mentions (a single leak has multiple file mentions because of the stack trace and other leak indicators). We record these counts and let TravisCI scream if one of the numbers increases. I wonder how stable/fragile such a metric would be as a simple refactoring could easily change these numbers. Below I ran valgrind on t5510 before and after Martin's patch. The diff below clearly shows the pkt-line leak. Would it make sense to pursue something like this in TravisCI to avoid "pkt-line" kind of leaks in the future? ## Valgrind run with leak $ ./t5510-fetch.sh --valgrind | perl -nE 'say /^==.+\((.+)\:.+\)$/' | sort | uniq -c 15529 39 abspath.c 30 add.c 34 branch.c 10 bundle.c 268 clone.c 13 commit.c 471 common-main.c 52 config.c 50 connect.c 4 connected.c 1 diff-lib.c 102 dir.c 120 environment.c 4 fetch-pack.c 47 fetch.c 14 files-backend.c 1 git-compat-util.h 871 git.c 96 init-db.c 26 iterator.c 1 list-objects.c 4 log-tree.c 2 object.c 1 pack-objects.c 6 parse-options.c 10 pathspec.c 83 pkt-line.c 6 precompose_utf8.c 2 push.c 67 refs.c 96 remote.c 90 repository.c 4 rev-list.c 10 revision.c 165 run-command.c 42 setup.c 288 strbuf.c 51 strbuf.h 2 tag.c 107 transport.c 10 tree-diff.c 85 upload-pack.c 1 usage.c 455 wrapper.c ## Valgrind run with Matrin's patch to fix the leak $ ./t5510-fetch.sh --valgrind | perl -nE 'say /^==.+\((.+)\:.+\)$/' | sort | uniq -c 14931 39 abspath.c 30 add.c 34 branch.c 10 bundle.c 268 clone.c 13 commit.c 433 common-main.c 52 config.c 50 connect.c 6 connected.c 1 diff-lib.c 102 dir.c 120 environment.c 4 fetch-pack.c 53 fetch.c 14 files-backend.c 1 git-compat-util.h 879 git.c 96 init-db.c 1 iterator.c 1 list-objects.c 4 log-tree.c 2 object.c 1 pack-objects.c 6 parse-options.c 10 pathspec.c 6 precompose_utf8.c 2 push.c 26 refs.c 96 remote.c 90 repository.c 6 rev-list.c 14 revision.c 171 run-command.c 42 setup.c 246 strbuf.c 50 strbuf.h 2 tag.c 107 transport.c 10 tree-diff.c 2 upload-pack.c 1 usage.c 415 wrapper.c ## Diff 15529| 14931 39 abspath.c 39 abspath.c 30 add.c 30 add.c 34 branch.c34 branch.c 10 bundle.c10 bundle.c 268 clone.c268 clone.c 13 commit.c13 commit.c 471 common-main.c | 433 common-main.c 52 config.c52 config.c 50 connect.c 50 connect.c 4 connected.c |6 connected.c 1 diff-lib.c 1 diff-lib.c 102 dir.c 102 dir.c 120 environment.c 120 environment.c 4 fetch-pack.c 4 fetch-pack.c 47 fetch.c | 53 fetch.c 14 files-backend.c 14 files-backend.c 1 git-compat-util.h1 git-compat-util.h 871 git.c | 879 git.c 96 init-db.c 96 init-db.c 26
[PATCH v3 3/3] refs/files-backend: correct return value in lock_ref_for_update
In one code path we return a literal -1 and not a symbolic constant. The value -1 would be interpreted as TRANSACTION_NAME_CONFLICT, which is wrong. Use TRANSACTION_GENERIC_ERROR instead (that is the only other return value we have to choose from). Noticed-by: Michael HaggertySigned-off-by: Martin Ågren --- v3: this is a new patch, as suggested by Michael refs/files-backend.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/refs/files-backend.c b/refs/files-backend.c index a2b3df21b..ad05d1d5f 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -2311,7 +2311,7 @@ static int lock_ref_for_update(struct files_ref_store *refs, strbuf_addf(err, "cannot lock ref '%s': " "error reading reference", original_update_refname(update)); - ret = -1; + ret = TRANSACTION_GENERIC_ERROR; goto out; } } else if (check_old_oid(update, >old_oid, err)) { -- 2.14.1.151.g45c1275a3.dirty
[PATCH v3 2/3] refs/files-backend: fix memory leak in lock_ref_for_update
After the previous patch, none of the functions we call hold on to `referent.buf`, so we can safely release the string buffer before returning. Signed-off-by: Martin Ågren--- v3: identical to v2 refs/files-backend.c | 31 --- 1 file changed, 20 insertions(+), 11 deletions(-) diff --git a/refs/files-backend.c b/refs/files-backend.c index 73615d869..a2b3df21b 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -2266,7 +2266,7 @@ static int lock_ref_for_update(struct files_ref_store *refs, struct strbuf referent = STRBUF_INIT; int mustexist = (update->flags & REF_HAVE_OLD) && !is_null_oid(>old_oid); - int ret; + int ret = 0; struct ref_lock *lock; files_assert_main_repository(refs, "lock_ref_for_update"); @@ -2278,7 +2278,7 @@ static int lock_ref_for_update(struct files_ref_store *refs, ret = split_head_update(update, transaction, head_ref, affected_refnames, err); if (ret) - return ret; + goto out; } ret = lock_raw_ref(refs, update->refname, mustexist, @@ -2292,7 +2292,7 @@ static int lock_ref_for_update(struct files_ref_store *refs, strbuf_addf(err, "cannot lock ref '%s': %s", original_update_refname(update), reason); free(reason); - return ret; + goto out; } update->backend_data = lock; @@ -2311,10 +2311,12 @@ static int lock_ref_for_update(struct files_ref_store *refs, strbuf_addf(err, "cannot lock ref '%s': " "error reading reference", original_update_refname(update)); - return -1; + ret = -1; + goto out; } } else if (check_old_oid(update, >old_oid, err)) { - return TRANSACTION_GENERIC_ERROR; + ret = TRANSACTION_GENERIC_ERROR; + goto out; } } else { /* @@ -2328,13 +2330,15 @@ static int lock_ref_for_update(struct files_ref_store *refs, referent.buf, transaction, affected_refnames, err); if (ret) - return ret; + goto out; } } else { struct ref_update *parent_update; - if (check_old_oid(update, >old_oid, err)) - return TRANSACTION_GENERIC_ERROR; + if (check_old_oid(update, >old_oid, err)) { + ret = TRANSACTION_GENERIC_ERROR; + goto out; + } /* * If this update is happening indirectly because of a @@ -2371,7 +2375,8 @@ static int lock_ref_for_update(struct files_ref_store *refs, "cannot update ref '%s': %s", update->refname, write_err); free(write_err); - return TRANSACTION_GENERIC_ERROR; + ret = TRANSACTION_GENERIC_ERROR; + goto out; } else { update->flags |= REF_NEEDS_COMMIT; } @@ -2385,10 +2390,14 @@ static int lock_ref_for_update(struct files_ref_store *refs, if (close_ref(lock)) { strbuf_addf(err, "couldn't close '%s.lock'", update->refname); - return TRANSACTION_GENERIC_ERROR; + ret = TRANSACTION_GENERIC_ERROR; + goto out; } } - return 0; + +out: + strbuf_release(); + return ret; } /* -- 2.14.1.151.g45c1275a3.dirty
[PATCH v3 1/3] refs/files-backend: add longer-scoped copy of string to list
split_symref_update() receives a string-pointer `referent` and adds it to the list of `affected_refnames`. The list simply holds on to the pointers it is given, it does not copy the strings and it does not ever free them. The `referent` string in split_symref_update() belongs to a string buffer in the caller. After we return, the string will be leaked. In the next patch, we want to properly release the string buffer in the caller, but we can't safely do so until we've made sure that `affected_refnames` will not be holding on to a pointer to the string. We could configure the list to handle its own resources, but it would mean some alloc/free-churning. The list is already handling other strings (through other code paths) which we do not need to worry about, and we'd be memory-churning those strings too, completely unnecessary. Observe that split_symref_update() creates a `new_update`-object through ref_transaction_add_update(), after which `new_update->refname` is a copy of `referent`. The difference is, this copy will be freed, and it will be freed *after* `affected_refnames` has been cleared. Rearrange the handling of `referent`, so that we don't add it directly to `affected_refnames`. Instead, first just check whether `referent` exists in the string list, and later add `new_update->refname`. Helped-by: Michael HaggertySigned-off-by: Martin Ågren --- v3: "O(lg N)" in comment; if-BUG() instead of assert() refs/files-backend.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/refs/files-backend.c b/refs/files-backend.c index fccbc24ac..73615d869 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -2144,13 +2144,12 @@ static int split_symref_update(struct files_ref_store *refs, /* * First make sure that referent is not already in the -* transaction. This insertion is O(N) in the transaction +* transaction. This check is O(lg N) in the transaction * size, but it happens at most once per symref in a * transaction. */ - item = string_list_insert(affected_refnames, referent); - if (item->util) { - /* An entry already existed */ + if (string_list_has_string(affected_refnames, referent)) { + /* An entry already exists */ strbuf_addf(err, "multiple updates for '%s' (including one " "via symref '%s') are not allowed", @@ -2185,6 +2184,17 @@ static int split_symref_update(struct files_ref_store *refs, update->flags |= REF_LOG_ONLY | REF_NODEREF; update->flags &= ~REF_HAVE_OLD; + /* +* Add the referent. This insertion is O(N) in the transaction +* size, but it happens at most once per symref in a +* transaction. Make sure to add new_update->refname, which will +* be valid as long as affected_refnames is in use, and NOT +* referent, which might soon be freed by our caller. +*/ + item = string_list_insert(affected_refnames, new_update->refname); + if (item->util) + BUG("%s unexpectedly found in affected_refnames", + new_update->refname); item->util = new_update; return 0; -- 2.14.1.151.g45c1275a3.dirty
Re: [PATCH] diff-highlight: Add clean target to Makefile
On Tue, Aug 29, 2017 at 12:23:11PM +0100, Daniel Watkins wrote: > Now that `make` produces a file, we should have a clean target to remove > it. > [...] > +clean: > + $(RM) diff-highlight > + Makes sense. Thanks! -Peff
Re: [PATCH 2/3] merge-recursive: remove return value from get_files_dirs
On Tue, Aug 29, 2017 at 03:58:00PM +, Kevin Willford wrote: > > > The return value of the get_files_dirs call is never being used. > > > Looking at the history of the file and it was originally only > > > being used for debug output statements. Also when > > > read_tree_recursive return value is non zero it is changed to > > > zero. This leads me to believe that it doesn't matter if > > > read_tree_recursive gets an error. > > > > Or that the function is buggy. :) > > That was one of my questions as well. Should read_tree_recursive > be propagating a -1 and merge_trees be checking for that and bail > when the call to get_files_dirs return is < 0? I made a commit with > this change and ran the tests and they all still passed so either this > return really doesn't matter or there are not sufficient tests covering > it. Right, in a normal flow I'd say that it should propagate the -1, etc. But since the only possible error is parse_tree() failing, which happens in a corrupt repository, I think it would be OK to just die(). That saves having to deal with error propagation. It's not very graceful, perhaps, but the important thing is that we don't quietly ignore the error. > I went with this change because it was not changing any of the > current functionality and if we find a case where it matters that > read_tree_recursive fails due to bad tree or something else we > can address it then. I think it's worth doing while we're thinking about it now, but it certainly can be a separate patch. -Peff
RE: [PATCH 2/3] merge-recursive: remove return value from get_files_dirs
> > On Mon, Aug 28, 2017 at 02:28:28PM -0600, Kevin Willford wrote: > > > The return value of the get_files_dirs call is never being used. > > Looking at the history of the file and it was originally only > > being used for debug output statements. Also when > > read_tree_recursive return value is non zero it is changed to > > zero. This leads me to believe that it doesn't matter if > > read_tree_recursive gets an error. > > Or that the function is buggy. :) That was one of my questions as well. Should read_tree_recursive be propagating a -1 and merge_trees be checking for that and bail when the call to get_files_dirs return is < 0? I made a commit with this change and ran the tests and they all still passed so either this return really doesn't matter or there are not sufficient tests covering it. I went with this change because it was not changing any of the current functionality and if we find a case where it matters that read_tree_recursive fails due to bad tree or something else we can address it then. > > I'm tempted to say that we should probably die() when > read_tree_recursive fails. This should only happen if we fail to parse > the tree, or if our callback (save_files_dirs here) returns failure, and > the latter looks like it never happens. > > > Since the debug output has been removed and the caller isn't > > checking the return value there is no reason to keep calulating > > and returning a value. > > Agreed, and I'm happy to see dead code go. > > Minor nit: s/calulating/calculating/ in your commit message. When will that spell checker for git messages be ready? ;) > > -Peff
Re: [PATCH v5 35/40] Add Documentation/technical/external-odb.txt
On Mon, Aug 28, 2017 at 8:59 PM, Ben Peartwrote: > > On 8/3/2017 5:19 AM, Christian Couder wrote: >> >> +Helpers >> +=== >> + >> +ODB helpers are commands that have to be registered using either the >> +"odb..subprocessCommand" or the "odb..scriptCommand" >> +config variables. >> + >> +Registering such a command tells Git that an external odb called >> + exists and that the registered command should be used to >> +communicate with it. > > What order are the odb handlers called? Are they called before or after the > regular object store code for loose, pack and alternates? Is the order > configurable? For get_*_object instructions the regular code is called before the odb helpers. (So the odb helper code is at the end of stat_sha1_file() and of open_sha1_file() in sha1_file.c.) For put_*_object instructions the regular code is called after the odb helpers. (So the odb helper code is at the beginning of write_sha1_file() in sha1_file.c.) And no this order is not configurable, but of course it could be made configurable. >> + - 'get_direct ' >> + >> +This instruction is similar as the other 'get_*' instructions except >> +that no object should be sent from the helper to Git. Instead the >> +helper should directly write the requested object into a loose object >> +file in the ".git/objects" directory. >> + >> +After the helper has sent the "status=success" packet and the >> +following flush packet in process mode, or after it has exited in the >> +script mode, Git should lookup again for a loose object file with the >> +requested sha1. > > When will git call get_direct vs one of the other get_* functions? It is called just before exiting when git cannot find an object. It is not exactly at the same place as other get_* instructions as I tried to reuse your code and as it looks like it makes it easier to retry the regular code after the odb helper code. > Could the > functionality of enabling a helper to populate objects into the regular > object store be provided by having a ODB helper that returned the object > data as requested by get_git_obj or get_raw_obj but also stored it in the > regular object store as a loose object (or pack file) for future calls? I am not sure I understand what you mean. If a helper returns the object data as requested by get_git_obj or get_raw_obj, then currently Git will itself store the object locally in its regular object store, so it is redundant for the helper to also store or try to store the object in the regular object store.
Re: git describe and "the smallest number of commits possible"
Stefan Beller venit, vidit, dixit 28.08.2017 20:24: > On Sat, Aug 26, 2017 at 7:47 AM, Kévin Le Gouguec >wrote: >> Hi, >> >> I've asked this question on the git-users Google Groups list[1], and >> while the answers there were interesting, I still cannot figure >> whether my problem comes from an actual bug, a misleading manpage, my >> inability to understand the code and/or the manpage, or a combination >> of the three. >> >> I noticed this problem on 2.1.4 (Debian oldstable); I can reproduce it >> on next (7ef03bb281b2220d0f2414365e4949beb2337042). Quoting >> git-describe(1)'s SEARCH STRATEGY section: >> >>> If multiple tags were found during the walk then the tag which has >>> the fewest commits different from the input commit-ish will be >>> selected and output. Here fewest commits different is defined as the >>> number of commits which would be shown by `git log tag..input` will >>> be the smallest number of commits possible. > > Maybe relevant > https://public-inbox.org/git/20160421113004.ga3...@aepfle.de/ > (specifically the discussion after this patch, it sheds light on the > heuristics used, though your case here doesn't use these heuristics) git describe is driving my crazy sometimes, too - both the results and the code ;) [long read with lot's of debug output follows, sorry] What is going on here is the following: git describe --tags decides based on "depth". git tag --merged dcc3ef3ee7 lists both emacs-25.1 and emac-25.2 (among many others). git rev-list --count emacs-25.1..dcc3ef3ee7 5126 git rev-list --count emacs-25.2..dcc3ef3ee7 4793 So, you would rightly expect 25.2 to be preferred over 25.1. Now, "git describe" does not do a "rev-list --count" for each tag but, instead, walks back from dcc3ef3ee7 and notes when it reaches a tagged commit, and how many steps it takes to get there. The information "reachable by the i-th tag that I encountered" is kept by a field of width 27, which limits the --candidates values. Once the walk passes more tags it is aborted. Then the possible matches are sorted. Then the depth calculation is finished. Funny order of actions, isn't it? Smells like BUG. At least it makes the debug output meaningless. Here's the debug output before and after finish_depth_calculation(): searching to describe HEAD lightweight 4069 emacs-25.1 lightweight 4086 emacs-25.1-rc2 lightweight 4126 emacs-25.1-rc1 annotated 4185 emacs-25.2 annotated 4220 emacs-25.2-rc2 lightweight 4226 emacs-25.0.95 annotated 4236 emacs-25.2-rc1 annotated 4280 emacs-25.1.91 lightweight 4305 emacs-25.0.94 lightweight 4329 emacs-25.1.90 traversed 4462 commits more than 10 tags found; listed 10 most recent gave up search at 5c587fdff164e8b90beb47f6da64b4884290e40a [finish_depth_calculation()] lightweight 129847 emacs-25.1 lightweight 4086 emacs-25.1-rc2 lightweight 4126 emacs-25.1-rc1 annotated 4185 emacs-25.2 annotated 4220 emacs-25.2-rc2 lightweight 4226 emacs-25.0.95 annotated 4236 emacs-25.2-rc1 annotated 4280 emacs-25.1.91 lightweight 4305 emacs-25.0.94 lightweight 4329 emacs-25.1.90 traversed 130257 commits more than 10 tags found; listed 10 most recent gave up search at 5c587fdff164e8b90beb47f6da64b4884290e40a emacs-25.1-129847-gdcc3ef3ee7 Now, I can't claim that I can wrap my head around finish_depth_calculation() and the way depth is calculated here, let alone those huge numbers (probably per not-contained tag or so), but that order of operations is clearly wrong. Now, if we fix finish_depth_calculation() to actually do the work for all possible candidates, and the sort afterwards, the result is: LANG=C devgit describe --tags --debug --candidates=10 searching to describe HEAD lightweight 129847 emacs-25.1 lightweight 129864 emacs-25.1-rc2 lightweight 129904 emacs-25.1-rc1 annotated 129981 emacs-25.2 lightweight 130004 emacs-25.0.95 annotated 130016 emacs-25.2-rc2 annotated 130032 emacs-25.2-rc1 annotated 130076 emacs-25.1.91 lightweight 130083 emacs-25.0.94 lightweight 130125 emacs-25.1.90 traversed 130257 commits more than 10 tags found; listed 10 most recent gave up search at 5c587fdff164e8b90beb47f6da64b4884290e40a emacs-25.1-129847-gdcc3ef3ee7 LANG=C devgit describe --tags --debug --candidates=100 searching to describe HEAD annotated 13470 emacs-24.5-rc3-fixed lightweight13471 emacs-24.5-rc3 lightweight13476 emacs-24.5-rc2 lightweight13482 emacs-24.5-rc1 lightweight13511 emacs-24.4.91 lightweight13554 emacs-24.4.90 lightweight13899 emacs-24.4 lightweight13902 emacs-24.4-rc1 lightweight13959 emacs-24.3.94 annotated 13972 mh-e-8.6 lightweight14049 emacs-24.3.93 lightweight14176 emacs-24.3.92 lightweight 129847 emacs-25.1 lightweight 129864 emacs-25.1-rc2 lightweight 129904 emacs-25.1-rc1 lightweight 129971 emacs-25.0.92
[PATCH] diff-highlight: Add clean target to Makefile
Now that `make` produces a file, we should have a clean target to remove it. Signed-off-by: Daniel Watkins--- contrib/diff-highlight/Makefile | 3 +++ 1 file changed, 3 insertions(+) diff --git a/contrib/diff-highlight/Makefile b/contrib/diff-highlight/Makefile index fbf5c5824..f2be7cc92 100644 --- a/contrib/diff-highlight/Makefile +++ b/contrib/diff-highlight/Makefile @@ -17,4 +17,7 @@ shebang.perl: FORCE test: all $(MAKE) -C t +clean: + $(RM) diff-highlight + .PHONY: FORCE -- 2.14.1
Re: [PATCH v2 2/2] refs/files-backend: fix memory leak in lock_ref_for_update
On 29 August 2017 at 10:39, Michael Haggertywrote: > On 08/28/2017 10:32 PM, Martin Ågren wrote: >> After the previous patch, none of the functions we call hold on to >> `referent.buf`, so we can safely release the string buffer before >> returning. > > This patch looks good to me, but I did notice a pre-existing problem in > the area... > >> --- >> refs/files-backend.c | 31 --- >> 1 file changed, 20 insertions(+), 11 deletions(-) >> >> diff --git a/refs/files-backend.c b/refs/files-backend.c >> index bdb0e22e5..15f34b10e 100644 >> --- a/refs/files-backend.c >> +++ b/refs/files-backend.c >> [...] >> @@ -2305,10 +2305,12 @@ static int lock_ref_for_update(struct >> files_ref_store *refs, >> strbuf_addf(err, "cannot lock ref >> '%s': " >> "error reading reference", >> >> original_update_refname(update)); >> - return -1; >> + ret = -1; >> + goto out; > > It is incorrect to return -1 here. First of all, stylistically, the > return value should be a symbolic constant. But in fact, it should be > returning `TRANSACTION_GENERIC_ERROR` here, whereas -1 is > `TRANSACTION_NAME_CONFLICT`. So the code is not just stylistically > wrong; it is functionally wrong. > > I know that this is not your mistake, but would you like to add another > patch to your series to fix this up? I'd do it myself, but it's a little > bit awkward because the fix will conflict with your patch. Sure. I'll send out a v3 later today. I'll fix this in a third patch, and I'll also address your comments on the first patch. Thanks a lot. Martin
Re: [PATCH v5 35/40] Add Documentation/technical/external-odb.txt
On Fri, Aug 25, 2017 at 11:23 PM, Jonathan Tanwrote: > On Fri, 25 Aug 2017 08:14:08 +0200 > Christian Couder wrote: > >> As Git is used by more and more by people having different needs, I >> think it is not realistic to expect that we can optimize its object >> storage for all these different needs. So a better strategy is to just >> let them store objects in external stores. > [snip] >> About these many use cases, I gave the "really big binary files" >> example which is why Git LFS exists (and which GitLab is interested in >> better solving), and the "really big number of files that are fetched >> only as needed" example which Microsoft is interested in solving. I >> could also imagine that some people have both big text files and big >> binary files in which case the "core.bigfilethreshold" might not work >> well, or that some people already have blobs in some different stores >> (like HTTP servers, Docker registries, artifact stores, ...) and want >> to fetch them from there as much as possible. > > Thanks for explaining the use cases - this makes sense, especially the > last one which motivates the different modes for the "get" command > (return raw bytes vs populating the Git repository with loose/packed > objects). Thanks for this comment. Now the beginning of the "Purpose" section looks like this: The purpose of this mechanism is to make possible to handle Git objects, especially blobs, in much more flexible ways. Currently Git can store its objects only in the form of loose objects in separate files or packed objects in a pack file. These existing object stores cannot be easily optimized for many different kind of contents. So the current stores are not flexible enough for some important use cases like handling really big binary files or handling a really big number of files that are fetched only as needed. And it is not realistic to expect that Git could fully natively handle many of such use cases. Git would need to natively implement different internal stores which would be a huge burden and which could lead to re-implement things like HTTP servers, Docker registries or artifact stores that already exist outside Git. >> And then letting people >> use different stores can make clones or fetches restartable which >> would solve another problem people have long been complaining about... > > This is unrelated to the rest of my e-mail, but out of curiosity, how > would a different store make clones or fetches restartable? Do you mean > that Git would invoke a "fetch" command through the ODB protocol instead > of using its own native protocol? Yeah, the idea is that during a clone (or a fetch), Git could first fetch some refs with "meta information", for example refs/odb/magic/* (where "magic" is the odb name) which would tell if some (new) bundles are available. If there are (new) bundles available then during the 'init' instruction, which takes place just after this first fetch, the external odb helper will notice that and "fetch" the (new) bundles using a restartable protocol, for example HTTP. If something goes wrong when the helper "fetches" a bundle, the helper could force the clone (or the fetch) to error out (after maybe retrying), and when the user (or the helper itself) tries again to clone (or fetch), the helper would restart its bundle "fetch" (using the restartable protocol). When this "fetch" eventually succeeds, then the helper will unbundle what it received, and then give back control to the second regular part of the clone (or fetch). This regular part of the clone (or fetch) will then try to fetch the usual refs, but as the unbundling has already updated the content of the usual refs (as well as the object stores) this fetch will find that everything is up-to-date. Ok, maybe everything is not quite up-to-date and there are still things to fetch, but anyway the biggest part of the clone (or fetch) has already been made using a restartable protocol, so we are doing much better than if we are not restartable at all. There are examples in t0430 at the end of the patch series of restartable clones. They don't really test that one can indeed restart a clone, but they show that things "just work" when `git clone` is split into an initial fetch (that updates "meta information" refs), then the 'init' instruction sent to an helper (that fetches and unbundles a bundle based on the "meta information" refs) and then the regular part of the clone. >> >> +Furthermore many improvements that are dependent on specific setups >> >> +could be implemented in the way Git objects are managed if it was >> >> +possible to customize how the Git objects are handled. For example a >> >> +restartable clone using the bundle mechanism has often been requested, >> >> +but implementing that would go against the current strict rules under >> >> +which the Git objects are currently handled. >> > >> > So in this example, you would
Re: [PATCH 3/3] merge-recursive: change current file dir string_lists to hashmap
On Mon, Aug 28, 2017 at 02:28:29PM -0600, Kevin Willford wrote: > The code was using two string_lists, one for the directories and > one for the files. The code never checks the lists independently > so we should be able to only use one list. The string_list also > is a O(log n) for lookup and insertion. Switching this to use a > hashmap will give O(1) which will save some time when there are > millions of paths that will be checked. Makes sense. I think when this code was written we didn't have the generalized hashmap.c, which is why it relies on string lists. > diff --git a/merge-recursive.c b/merge-recursive.c > index d47f40ea81..ef4fe5f09f 100644 > --- a/merge-recursive.c > +++ b/merge-recursive.c > @@ -24,6 +24,26 @@ > #include "dir.h" > #include "submodule.h" > > +struct path_hashmap_entry { > + struct hashmap_entry; > + char path[FLEX_ARRAY]; > +}; This seems like a good use of FLEX_ARRAY. > +static unsigned int (*pathhash)(const char *path); > +static int (*pathcmp)(const char *a, const char *b); These global function pointers are sufficiently out of the ordinary that it might be worth adding a comment about how they're intended to be set and used. I also suspect they could be stuffed into the merge_options struct along with the hashmap itself, and then passed via the cmp_data pointer (which is relatively new-ish; I won't be surprised if you wrote this patch before that existed and forward-ported it). > +static int path_hashmap_cmp(const void *cmp_data, > + const void *entry, > + const void *entry_or_key, > + const void *keydata) > +{ > + const struct path_hashmap_entry *a = entry; > + const struct path_hashmap_entry *b = entry_or_key; > + const char *key = keydata; > + > + return pathcmp(a->path, key ? key : b->path); > +} The hashmap interface makes its cmp functions a bit tricky to write, but this all looks correct to me. > @@ -642,6 +662,8 @@ static void add_flattened_path(struct strbuf *out, const > char *s) > > static char *unique_path(struct merge_options *o, const char *path, const > char *branch) > { > + struct path_hashmap_entry dummy; > + struct path_hashmap_entry *entry; > struct strbuf newpath = STRBUF_INIT; > int suffix = 0; > size_t base_len; > @@ -650,14 +672,17 @@ static char *unique_path(struct merge_options *o, const > char *path, const char * > add_flattened_path(, branch); > > base_len = newpath.len; > - while (string_list_has_string(>current_file_set, newpath.buf) || > -string_list_has_string(>current_directory_set, newpath.buf) || > + hashmap_entry_init(, pathhash(newpath.buf)); > + while (hashmap_get(>current_file_dir_set, , newpath.buf) || > (!o->call_depth && file_exists(newpath.buf))) { > strbuf_setlen(, base_len); > strbuf_addf(, "_%d", suffix++); > + hashmap_entry_init(, pathhash(newpath.buf)); > } I think you can use hashmap_get_from_hash() here to avoid dealing with "dummy" yourself. > @@ -1941,8 +1966,7 @@ int merge_trees(struct merge_options *o, > if (unmerged_cache()) { > struct string_list *entries, *re_head, *re_merge; > int i; > - string_list_clear(>current_file_set, 1); > - string_list_clear(>current_directory_set, 1); > + hashmap_init(>current_file_dir_set, path_hashmap_cmp, NULL, > 512); This hunk seems funny. string_list_clear() assumes the input is an already-initialized string list, and we drop all of its entries. But hashmap_init assumes we have an _uninitialized_ chunk of memory, and throws away whatever was there. Do we ever hit this code path when there's actually something in the hashmap? I'd think we would want to hashmap_init() at the same place we used to string_list_init(). I.e., in init_merge_options. > @@ -1978,6 +2002,8 @@ int merge_trees(struct merge_options *o, > string_list_clear(re_head, 0); > string_list_clear(entries, 1); > > + hashmap_free(>current_file_dir_set, 1); Ah, I think this answers my question above. We only care about the contents during this one bit of code, so we do an init/free pair. And since there's no "clear" function for a hashmap that drops the entries without resetting things like the comparison function, we _have_ to call hashmap_init in the earlier hunk. That does make me wonder if it would be sane to just declare the hashmap local to this block, which would make its lifetime much more clear. I guess having it in merge_options is convenient for passing it around, though. > @@ -2179,8 +2205,8 @@ void init_merge_options(struct merge_options *o) > [...] The rest of the patch looks like a straight-forward conversion. With the understanding I have above, I think everything here is correct, but there are a few minor bits it might be worth addressing to make the
Re: [PATCH v2 2/2] refs/files-backend: fix memory leak in lock_ref_for_update
On 08/28/2017 10:32 PM, Martin Ågren wrote: > After the previous patch, none of the functions we call hold on to > `referent.buf`, so we can safely release the string buffer before > returning. This patch looks good to me, but I did notice a pre-existing problem in the area... > --- > refs/files-backend.c | 31 --- > 1 file changed, 20 insertions(+), 11 deletions(-) > > diff --git a/refs/files-backend.c b/refs/files-backend.c > index bdb0e22e5..15f34b10e 100644 > --- a/refs/files-backend.c > +++ b/refs/files-backend.c > [...] > @@ -2305,10 +2305,12 @@ static int lock_ref_for_update(struct files_ref_store > *refs, > strbuf_addf(err, "cannot lock ref '%s': > " > "error reading reference", > > original_update_refname(update)); > - return -1; > + ret = -1; > + goto out; It is incorrect to return -1 here. First of all, stylistically, the return value should be a symbolic constant. But in fact, it should be returning `TRANSACTION_GENERIC_ERROR` here, whereas -1 is `TRANSACTION_NAME_CONFLICT`. So the code is not just stylistically wrong; it is functionally wrong. I know that this is not your mistake, but would you like to add another patch to your series to fix this up? I'd do it myself, but it's a little bit awkward because the fix will conflict with your patch. Thanks, Michael
Re: [PATCH v2 1/2] refs/files-backend: add longer-scoped copy of string to list
On 08/28/2017 10:32 PM, Martin Ågren wrote: > split_symref_update() receives a string-pointer `referent` and adds it > to the list of `affected_refnames`. The list simply holds on to the > pointers it is given, it does not copy the strings and it never frees > them. The `referent` string in split_symref_update() belongs to a string > buffer in the caller. After we return, the string will be leaked. > > In the next patch, we want to properly release the string buffer in the > caller, but we can't safely do so until we've made sure that > `affected_refnames` will not be holding on to a pointer to the string. > We could configure the list to handle its own resources, but it would > mean some alloc/free-churning. The list is already handling other > strings (through other code paths) which we do not need to worry about, > and we'd be memory-churning those strings too, completely unnecessary. > > Observe that split_symref_update() creates a `new_update`-object and > that `new_update->refname` is then a copy of `referent`. The difference > is, this copy will be freed, and it will be freed *after* > `affected_refnames` has been cleared. > > Rearrange the handling of `referent`, so that we don't add it directly > to `affected_refnames`. Instead, first just check whether `referent` > exists in the string list, and later add `new_update->refname`. > > Helped-by: Michael Haggerty> Signed-off-by: Martin Ågren > --- > Thanks Junio and Michael for your comments on the first version. This > first patch is now completely different and much much better (thanks > Michael!). The commit message should also be better (sorry Junio...). > The second one has a new commit message, but the diff is the same. > > Martin > > refs/files-backend.c | 16 > 1 file changed, 12 insertions(+), 4 deletions(-) > > diff --git a/refs/files-backend.c b/refs/files-backend.c > index 5cca55510..bdb0e22e5 100644 > --- a/refs/files-backend.c > +++ b/refs/files-backend.c > @@ -2140,13 +2140,12 @@ static int split_symref_update(struct files_ref_store > *refs, > > /* >* First make sure that referent is not already in the > - * transaction. This insertion is O(N) in the transaction > + * transaction. This check is O(N) in the transaction The check is not O(N); it is O(lg N) because it is implemented via a binary search in the (sorted) `affected_refnames`. The insertion below is still O(N), because it has to shift entries later in the list to the right to make room for the new entry. >* size, but it happens at most once per symref in a >* transaction. >*/ > - item = string_list_insert(affected_refnames, referent); > - if (item->util) { > - /* An entry already existed */ > + if (string_list_has_string(affected_refnames, referent)) { > + /* An entry already exists */ > strbuf_addf(err, > "multiple updates for '%s' (including one " > "via symref '%s') are not allowed", > @@ -2181,6 +2180,15 @@ static int split_symref_update(struct files_ref_store > *refs, > update->flags |= REF_LOG_ONLY | REF_NODEREF; > update->flags &= ~REF_HAVE_OLD; > > + /* > + * Add the referent. This insertion is O(N) in the transaction > + * size, but it happens at most once per symref in a > + * transaction. Make sure to add new_update->refname, which will > + * be valid as long as affected_refnames is in use, and NOT > + * referent, which might soon be freed by our caller. > + */ > + item = string_list_insert(affected_refnames, new_update->refname); > + assert(!item->util); We generally avoid using `assert()`. It would be preferable to use if (item->util) BUG("%s unexpectedly found in affected_refnames", new_update->refname); > item->util = new_update; > > return 0; > Otherwise, looks good! Michael
[PATCH 03/10] packed_ref_store: implement reference transactions
Implement the methods needed to support reference transactions for the packed-refs backend. The new methods are not yet used. Signed-off-by: Michael Haggerty--- refs/packed-backend.c | 313 +- refs/packed-backend.h | 9 ++ 2 files changed, 319 insertions(+), 3 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 7e348feac3..d19b3bfba5 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -744,25 +744,332 @@ static int packed_init_db(struct ref_store *ref_store, struct strbuf *err) return 0; } +/* + * Write the packed-refs from the cache to the packed-refs tempfile, + * incorporating any changes from `updates`. `updates` must be a + * sorted string list whose keys are the refnames and whose util + * values are `struct ref_update *`. On error, rollback the tempfile, + * write an error message to `err`, and return a nonzero value. + * + * The packfile must be locked before calling this function and will + * remain locked when it is done. + */ +static int write_with_updates(struct packed_ref_store *refs, + struct string_list *updates, + struct strbuf *err) +{ + struct ref_iterator *iter = NULL; + size_t i; + int ok; + FILE *out; + struct strbuf sb = STRBUF_INIT; + char *packed_refs_path; + + if (!is_lock_file_locked(>lock)) + die("BUG: write_with_updates() called while unlocked"); + + /* +* If packed-refs is a symlink, we want to overwrite the +* symlinked-to file, not the symlink itself. Also, put the +* staging file next to it: +*/ + packed_refs_path = get_locked_file_path(>lock); + strbuf_addf(, "%s.new", packed_refs_path); + free(packed_refs_path); + if (create_tempfile(>tempfile, sb.buf) < 0) { + strbuf_addf(err, "unable to create file %s: %s", + sb.buf, strerror(errno)); + strbuf_release(); + return -1; + } + strbuf_release(); + + out = fdopen_tempfile(>tempfile, "w"); + if (!out) { + strbuf_addf(err, "unable to fdopen packed-refs tempfile: %s", + strerror(errno)); + goto error; + } + + if (fprintf(out, "%s", PACKED_REFS_HEADER) < 0) + goto write_error; + + /* +* We iterate in parallel through the current list of refs and +* the list of updates, processing an entry from at least one +* of the lists each time through the loop. When the current +* list of refs is exhausted, set iter to NULL. When the list +* of updates is exhausted, leave i set to updates->nr. +*/ + iter = packed_ref_iterator_begin(>base, "", +DO_FOR_EACH_INCLUDE_BROKEN); + if ((ok = ref_iterator_advance(iter)) != ITER_OK) + iter = NULL; + + i = 0; + + while (iter || i < updates->nr) { + struct ref_update *update = NULL; + int cmp; + + if (i >= updates->nr) { + cmp = -1; + } else { + update = updates->items[i].util; + + if (!iter) + cmp = +1; + else + cmp = strcmp(iter->refname, update->refname); + } + + if (!cmp) { + /* +* There is both an old value and an update +* for this reference. Check the old value if +* necessary: +*/ + if ((update->flags & REF_HAVE_OLD)) { + if (is_null_oid(>old_oid)) { + strbuf_addf(err, "cannot update ref '%s': " + "reference already exists", + update->refname); + goto error; + } else if (oidcmp(>old_oid, iter->oid)) { + strbuf_addf(err, "cannot update ref '%s': " + "is at %s but expected %s", + update->refname, + oid_to_hex(iter->oid), + oid_to_hex(>old_oid)); + goto error; + } + } + + /* Now figure out what to use for the new value: */ + if ((update->flags & REF_HAVE_NEW)) { + /* +* The
[PATCH 02/10] struct ref_transaction: add a place for backends to store data
`packed_ref_store` is going to want to store some transaction-wide data, so make a place for it. Signed-off-by: Michael Haggerty--- refs/refs-internal.h | 1 + 1 file changed, 1 insertion(+) diff --git a/refs/refs-internal.h b/refs/refs-internal.h index b02dc5a7e3..d7d344de73 100644 --- a/refs/refs-internal.h +++ b/refs/refs-internal.h @@ -242,6 +242,7 @@ struct ref_transaction { size_t alloc; size_t nr; enum ref_transaction_state state; + void *backend_data; }; /* -- 2.14.1
[PATCH 10/10] files_transaction_finish(): delete reflogs before references
If the deletion steps unexpectedly fail, it is less bad to leave a reference without its reflog than it is to leave a reflog without its reference, since the latter is an invalid repository state. Signed-off-by: Michael Haggerty--- refs/files-backend.c | 35 +-- 1 file changed, 21 insertions(+), 14 deletions(-) diff --git a/refs/files-backend.c b/refs/files-backend.c index 4f4c47b9db..36f81b4f28 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -2630,6 +2630,27 @@ static int files_transaction_finish(struct ref_store *ref_store, } } + /* +* Now that updates are safely completed, we can perform +* deletes. First delete the reflogs of any references that +* will be deleted, since (in the unexpected event of an +* error) leaving a reference without a reflog is less bad +* than leaving a reflog without a reference (the latter is a +* mildly invalid repository state): +*/ + for (i = 0; i < transaction->nr; i++) { + struct ref_update *update = transaction->updates[i]; + if (update->flags & REF_DELETING && + !(update->flags & REF_LOG_ONLY) && + !(update->flags & REF_ISPRUNING)) { + strbuf_reset(); + files_reflog_path(refs, , update->refname); + if (!unlink_or_warn(sb.buf)) + try_remove_empty_parents(refs, update->refname, + REMOVE_EMPTY_PARENTS_REFLOG); + } + } + /* * Perform deletes now that updates are safely completed. * @@ -2666,20 +2687,6 @@ static int files_transaction_finish(struct ref_store *ref_store, } } - /* Delete the reflogs of any references that were deleted: */ - for (i = 0; i < transaction->nr; i++) { - struct ref_update *update = transaction->updates[i]; - if (update->flags & REF_DELETING && - !(update->flags & REF_LOG_ONLY) && - !(update->flags & REF_ISPRUNING)) { - strbuf_reset(); - files_reflog_path(refs, , update->refname); - if (!unlink_or_warn(sb.buf)) - try_remove_empty_parents(refs, update->refname, - REMOVE_EMPTY_PARENTS_REFLOG); - } - } - clear_loose_ref_cache(refs); cleanup: -- 2.14.1
[PATCH 09/10] packed-backend: rip out some now-unused code
Now the outside world interacts with the packed ref store only via the generic refs API plus a few lock-related functions. This allows us to delete some functions that are no longer used, thereby completing the encapsulation of the packed ref store. Signed-off-by: Michael Haggerty--- refs/packed-backend.c | 193 -- refs/packed-backend.h | 8 --- 2 files changed, 201 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 83a088118f..90f44c1fbb 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -91,19 +91,6 @@ struct ref_store *packed_ref_store_create(const char *path, return ref_store; } -/* - * Die if refs is not the main ref store. caller is used in any - * necessary error messages. - */ -static void packed_assert_main_repository(struct packed_ref_store *refs, - const char *caller) -{ - if (refs->store_flags & REF_STORE_MAIN) - return; - - die("BUG: operation %s only allowed for main ref store", caller); -} - /* * Downcast `ref_store` to `packed_ref_store`. Die if `ref_store` is * not a `packed_ref_store`. Also die if `packed_ref_store` doesn't @@ -321,40 +308,6 @@ static struct ref_dir *get_packed_refs(struct packed_ref_store *refs) return get_packed_ref_dir(get_packed_ref_cache(refs)); } -/* - * Add or overwrite a reference in the in-memory packed reference - * cache. This may only be called while the packed-refs file is locked - * (see packed_refs_lock()). To actually write the packed-refs file, - * call commit_packed_refs(). - */ -void add_packed_ref(struct ref_store *ref_store, - const char *refname, const struct object_id *oid) -{ - struct packed_ref_store *refs = - packed_downcast(ref_store, REF_STORE_WRITE, - "add_packed_ref"); - struct ref_dir *packed_refs; - struct ref_entry *packed_entry; - - if (!is_lock_file_locked(>lock)) - die("BUG: packed refs not locked"); - - if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) - die("Reference has invalid format: '%s'", refname); - - packed_refs = get_packed_refs(refs); - packed_entry = find_ref_entry(packed_refs, refname); - if (packed_entry) { - /* Overwrite the existing entry: */ - oidcpy(_entry->u.value.oid, oid); - packed_entry->flag = REF_ISPACKED; - oidclr(_entry->u.value.peeled); - } else { - packed_entry = create_ref_entry(refname, oid, REF_ISPACKED); - add_ref_entry(packed_refs, packed_entry); - } -} - /* * Return the ref_entry for the given refname from the packed * references. If it does not exist, return NULL. @@ -592,152 +545,6 @@ int packed_refs_is_locked(struct ref_store *ref_store) static const char PACKED_REFS_HEADER[] = "# pack-refs with: peeled fully-peeled \n"; -/* - * Write the current version of the packed refs cache from memory to - * disk. The packed-refs file must already be locked for writing (see - * packed_refs_lock()). Return zero on success. On errors, rollback - * the lockfile, write an error message to `err`, and return a nonzero - * value. - */ -int commit_packed_refs(struct ref_store *ref_store, struct strbuf *err) -{ - struct packed_ref_store *refs = - packed_downcast(ref_store, REF_STORE_WRITE | REF_STORE_MAIN, - "commit_packed_refs"); - struct packed_ref_cache *packed_ref_cache = - get_packed_ref_cache(refs); - int ok; - int ret = -1; - struct strbuf sb = STRBUF_INIT; - FILE *out; - struct ref_iterator *iter; - char *packed_refs_path; - - if (!is_lock_file_locked(>lock)) - die("BUG: commit_packed_refs() called when unlocked"); - - /* -* If packed-refs is a symlink, we want to overwrite the -* symlinked-to file, not the symlink itself. Also, put the -* staging file next to it: -*/ - packed_refs_path = get_locked_file_path(>lock); - strbuf_addf(, "%s.new", packed_refs_path); - if (create_tempfile(>tempfile, sb.buf) < 0) { - strbuf_addf(err, "unable to create file %s: %s", - sb.buf, strerror(errno)); - strbuf_release(); - goto out; - } - strbuf_release(); - - out = fdopen_tempfile(>tempfile, "w"); - if (!out) { - strbuf_addf(err, "unable to fdopen packed-refs tempfile: %s", - strerror(errno)); - goto error; - } - - if (fprintf(out, "%s", PACKED_REFS_HEADER) < 0) { - strbuf_addf(err, "error writing to %s: %s", - get_tempfile_path(>tempfile), strerror(errno)); - goto error; -
[PATCH 08/10] files_ref_store: use a transaction to update packed refs
When processing a `files_ref_store` transaction, it is sometimes necessary to delete some references from the "packed-refs" file. Do that using a reference transaction conducted against the `packed_ref_store`. This change further decouples `files_ref_store` from `packed_ref_store`. It also fixes multiple problems, including the two revealed by test cases added in the previous commit. First, the old code didn't obtain the `packed-refs` lock until `files_transaction_finish()`. This means that a failure to acquire the `packed-refs` lock (e.g., due to contention with another process) wasn't detected until it was too late (problems like this are supposed to be detected in the "prepare" phase). The new code acquires the `packed-refs` lock in `files_transaction_prepare()`, the same stage of the processing when the loose reference locks are being acquired, removing another reason why the "prepare" phase might succeed and the "finish" phase might nevertheless fail. Second, the old code deleted the loose version of a reference before deleting any packed version of the same reference. This left a moment when another process might think that the packed version of the reference is current, which is incorrect. (Even worse, the packed version of the reference can be arbitrarily old, and might even point at an object that has since been garbage-collected.) Third, if a reference deletion fails to acquire the `packed-refs` lock altogether, then the old code might leave the repository in the incorrect state (possibly corrupt) described in the previous paragraph. Now we activate the new "packed-refs" file (sans any references that are being deleted) *before* deleting the corresponding loose references. But we hold the "packed-refs" lock until after the loose references have been finalized, thus preventing a simultaneous "pack-refs" process from packing the loose version of the reference in the time gap, which would otherwise defeat our attempt to delete it. Signed-off-by: Michael Haggerty--- refs/files-backend.c | 132 +-- t/t1404-update-ref-errors.sh | 4 +- 2 files changed, 103 insertions(+), 33 deletions(-) diff --git a/refs/files-backend.c b/refs/files-backend.c index 29c7c78602..4f4c47b9db 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -2391,13 +2391,22 @@ static int lock_ref_for_update(struct files_ref_store *refs, return 0; } +struct files_transaction_backend_data { + struct ref_transaction *packed_transaction; + int packed_refs_locked; +}; + /* * Unlock any references in `transaction` that are still locked, and * mark the transaction closed. */ -static void files_transaction_cleanup(struct ref_transaction *transaction) +static void files_transaction_cleanup(struct files_ref_store *refs, + struct ref_transaction *transaction) { size_t i; + struct files_transaction_backend_data *backend_data = + transaction->backend_data; + struct strbuf err = STRBUF_INIT; for (i = 0; i < transaction->nr; i++) { struct ref_update *update = transaction->updates[i]; @@ -2409,6 +2418,17 @@ static void files_transaction_cleanup(struct ref_transaction *transaction) } } + if (backend_data->packed_transaction && + ref_transaction_abort(backend_data->packed_transaction, )) { + error("error aborting transaction: %s", err.buf); + strbuf_release(); + } + + if (backend_data->packed_refs_locked) + packed_refs_unlock(refs->packed_ref_store); + + free(backend_data); + transaction->state = REF_TRANSACTION_CLOSED; } @@ -2425,12 +2445,17 @@ static int files_transaction_prepare(struct ref_store *ref_store, char *head_ref = NULL; int head_type; struct object_id head_oid; + struct files_transaction_backend_data *backend_data; + struct ref_transaction *packed_transaction = NULL; assert(err); if (!transaction->nr) goto cleanup; + backend_data = xcalloc(1, sizeof(*backend_data)); + transaction->backend_data = backend_data; + /* * Fail if a refname appears more than once in the * transaction. (If we end up splitting up any updates using @@ -2497,6 +2522,41 @@ static int files_transaction_prepare(struct ref_store *ref_store, head_ref, _refnames, err); if (ret) break; + + if (update->flags & REF_DELETING && + !(update->flags & REF_LOG_ONLY) && + !(update->flags & REF_ISPRUNING)) { + /* +* This reference has to be deleted from +* packed-refs if it exists there. +*/ + if
[PATCH 01/10] packed-backend: don't adjust the reference count on lock/unlock
The old code incremented the packed ref cache reference count when acquiring the packed-refs lock, and decremented the count when releasing the lock. This is unnecessary because a locked packed-refs file cannot be changed, so there is no reason for the cache to become stale. Moreover, the extra reference count causes a problem if we intentionally clear the packed refs cache, as we sometimes need to do if we change the cache in anticipation of writing a change to disk, but then the write to disk fails. In that case, `packed_refs_unlock()` would have no easy way to find the cache whose reference count it needs to decrement. This whole issue will soon become moot due to upcoming changes that avoid changing the in-memory cache as part of updating the packed-refs on disk, but this change makes that transition easier. Signed-off-by: Michael Haggerty--- refs/packed-backend.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 412c85034f..7e348feac3 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -525,7 +525,6 @@ int packed_refs_lock(struct ref_store *ref_store, int flags, struct strbuf *err) "packed_refs_lock"); static int timeout_configured = 0; static int timeout_value = 1000; - struct packed_ref_cache *packed_ref_cache; if (!timeout_configured) { git_config_get_int("core.packedrefstimeout", _value); @@ -560,9 +559,7 @@ int packed_refs_lock(struct ref_store *ref_store, int flags, struct strbuf *err) */ validate_packed_ref_cache(refs); - packed_ref_cache = get_packed_ref_cache(refs); - /* Increment the reference count to prevent it from being freed: */ - acquire_packed_ref_cache(packed_ref_cache); + get_packed_ref_cache(refs); return 0; } @@ -576,7 +573,6 @@ void packed_refs_unlock(struct ref_store *ref_store) if (!is_lock_file_locked(>lock)) die("BUG: packed_refs_unlock() called when not locked"); rollback_lock_file(>lock); - release_packed_ref_cache(refs->cache); } int packed_refs_is_locked(struct ref_store *ref_store) -- 2.14.1
[PATCH 04/10] packed_delete_refs(): implement method
Implement `packed_delete_refs()` using a reference transaction. This means that `files_delete_refs()` can use `refs_delete_refs()` instead of `repack_without_refs()` to delete any packed references, decreasing the coupling between the classes. Signed-off-by: Michael Haggerty--- refs/files-backend.c | 2 +- refs/packed-backend.c | 45 - 2 files changed, 45 insertions(+), 2 deletions(-) diff --git a/refs/files-backend.c b/refs/files-backend.c index fccbc24ac4..2c78f63494 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -1157,7 +1157,7 @@ static int files_delete_refs(struct ref_store *ref_store, const char *msg, if (packed_refs_lock(refs->packed_ref_store, 0, )) goto error; - if (repack_without_refs(refs->packed_ref_store, refnames, )) { + if (refs_delete_refs(refs->packed_ref_store, msg, refnames, flags)) { packed_refs_unlock(refs->packed_ref_store); goto error; } diff --git a/refs/packed-backend.c b/refs/packed-backend.c index d19b3bfba5..83a088118f 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -1082,7 +1082,50 @@ static int packed_initial_transaction_commit(struct ref_store *ref_store, static int packed_delete_refs(struct ref_store *ref_store, const char *msg, struct string_list *refnames, unsigned int flags) { - die("BUG: not implemented yet"); + struct packed_ref_store *refs = + packed_downcast(ref_store, REF_STORE_WRITE, "delete_refs"); + struct strbuf err = STRBUF_INIT; + struct ref_transaction *transaction; + struct string_list_item *item; + int ret; + + (void)refs; /* We need the check above, but don't use the variable */ + + if (!refnames->nr) + return 0; + + /* +* Since we don't check the references' old_oids, the +* individual updates can't fail, so we can pack all of the +* updates into a single transaction. +*/ + + transaction = ref_store_transaction_begin(ref_store, ); + if (!transaction) + return -1; + + for_each_string_list_item(item, refnames) { + if (ref_transaction_delete(transaction, item->string, NULL, + flags, msg, )) { + warning(_("could not delete reference %s: %s"), + item->string, err.buf); + strbuf_reset(); + } + } + + ret = ref_transaction_commit(transaction, ); + + if (ret) { + if (refnames->nr == 1) + error(_("could not delete reference %s: %s"), + refnames->items[0].string, err.buf); + else + error(_("could not delete references: %s"), err.buf); + } + + ref_transaction_free(transaction); + strbuf_release(); + return ret; } static int packed_pack_refs(struct ref_store *ref_store, unsigned int flags) -- 2.14.1
[PATCH 06/10] files_initial_transaction_commit(): use a transaction for packed refs
Used a `packed_ref_store` transaction in the implementation of `files_initial_transaction_commit()` rather than using internal features of the packed ref store. This further decouples `files_ref_store` from `packed_ref_store`. Signed-off-by: Michael Haggerty--- refs/files-backend.c | 29 +++-- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/refs/files-backend.c b/refs/files-backend.c index 0fc44eaca5..29c7c78602 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -2663,6 +2663,7 @@ static int files_initial_transaction_commit(struct ref_store *ref_store, size_t i; int ret = 0; struct string_list affected_refnames = STRING_LIST_INIT_NODUP; + struct ref_transaction *packed_transaction = NULL; assert(err); @@ -2695,6 +2696,12 @@ static int files_initial_transaction_commit(struct ref_store *ref_store, _refnames)) die("BUG: initial ref transaction called with existing refs"); + packed_transaction = ref_store_transaction_begin(refs->packed_ref_store, err); + if (!packed_transaction) { + ret = TRANSACTION_GENERIC_ERROR; + goto cleanup; + } + for (i = 0; i < transaction->nr; i++) { struct ref_update *update = transaction->updates[i]; @@ -2707,6 +2714,15 @@ static int files_initial_transaction_commit(struct ref_store *ref_store, ret = TRANSACTION_NAME_CONFLICT; goto cleanup; } + + /* +* Add a reference creation for this reference to the +* packed-refs transaction: +*/ + ref_transaction_add_update(packed_transaction, update->refname, + update->flags & ~REF_HAVE_OLD, + update->new_oid.hash, update->old_oid.hash, + NULL); } if (packed_refs_lock(refs->packed_ref_store, 0, err)) { @@ -2714,21 +2730,14 @@ static int files_initial_transaction_commit(struct ref_store *ref_store, goto cleanup; } - for (i = 0; i < transaction->nr; i++) { - struct ref_update *update = transaction->updates[i]; - - if ((update->flags & REF_HAVE_NEW) && - !is_null_oid(>new_oid)) - add_packed_ref(refs->packed_ref_store, update->refname, - >new_oid); - } - - if (commit_packed_refs(refs->packed_ref_store, err)) { + if (initial_ref_transaction_commit(packed_transaction, err)) { ret = TRANSACTION_GENERIC_ERROR; goto cleanup; } cleanup: + if (packed_transaction) + ref_transaction_free(packed_transaction); packed_refs_unlock(refs->packed_ref_store); transaction->state = REF_TRANSACTION_CLOSED; string_list_clear(_refnames, 0); -- 2.14.1
[PATCH 05/10] files_pack_refs(): use a reference transaction to write packed refs
Now that the packed reference store supports transactions, we can use a transaction to write the packed versions of references that we want to pack. This decreases the coupling between `files_ref_store` and `packed_ref_store`. Signed-off-by: Michael Haggerty--- refs/files-backend.c | 24 +--- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/refs/files-backend.c b/refs/files-backend.c index 2c78f63494..0fc44eaca5 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -1100,9 +1100,14 @@ static int files_pack_refs(struct ref_store *ref_store, unsigned int flags) int ok; struct ref_to_prune *refs_to_prune = NULL; struct strbuf err = STRBUF_INIT; + struct ref_transaction *transaction; packed_refs_lock(refs->packed_ref_store, LOCK_DIE_ON_ERROR, ); + transaction = ref_store_transaction_begin(refs->packed_ref_store, ); + if (!transaction) + return -1; + iter = cache_ref_iterator_begin(get_loose_ref_cache(refs), NULL, 0); while ((ok = ref_iterator_advance(iter)) == ITER_OK) { /* @@ -1115,12 +1120,14 @@ static int files_pack_refs(struct ref_store *ref_store, unsigned int flags) continue; /* -* Create an entry in the packed-refs cache equivalent -* to the one from the loose ref cache, except that -* we don't copy the peeled status, because we want it -* to be re-peeled. +* Add a reference creation for this reference to the +* packed-refs transaction: */ - add_packed_ref(refs->packed_ref_store, iter->refname, iter->oid); + if (ref_transaction_update(transaction, iter->refname, + iter->oid->hash, NULL, + REF_NODEREF, NULL, )) + die("failure preparing to create packed reference %s: %s", + iter->refname, err.buf); /* Schedule the loose reference for pruning if requested. */ if ((flags & PACK_REFS_PRUNE)) { @@ -1134,8 +1141,11 @@ static int files_pack_refs(struct ref_store *ref_store, unsigned int flags) if (ok != ITER_DONE) die("error while iterating over references"); - if (commit_packed_refs(refs->packed_ref_store, )) - die("unable to overwrite old ref-pack file: %s", err.buf); + if (ref_transaction_commit(transaction, )) + die("unable to write new packed-refs: %s", err.buf); + + ref_transaction_free(transaction); + packed_refs_unlock(refs->packed_ref_store); prune_refs(refs, refs_to_prune); -- 2.14.1
[PATCH 07/10] t1404: demonstrate two problems with reference transactions
Currently, a loose reference is deleted even before locking the `packed-refs` file, let alone deleting any packed version of the reference. This leads to two problems, demonstrated by two new tests: * While a reference is being deleted, other processes might see the old, packed value of the reference for a moment before the packed version is deleted. Normally this would be hard to observe, but we can prolong the window by locking the `packed-refs` file externally before running `update-ref`, then unlocking it before `update-ref`'s attempt to acquire the lock times out. * If the `packed-refs` file is locked so long that `update-ref` fails to lock it, then the reference can be left permanently in the incorrect state described in the previous point. In a moment, both problems will be fixed. Signed-off-by: Michael Haggerty--- The first of the new tests is rather involved; it uses two background processes plus a foreground process that polls the value of a reference. But despite sensitivity to timing, I think it should be robust even in this broken state. Once the functionality being tested is fixed, this test should never produce false positives, though really bad timing (e.g., if it takes more than a second for `update-ref` to get going) could still lead to false negatives. Each of the new tests takes about a second to run because they simulate lock contention. If anybody has suggestions for better ways to test these things, please speak up :-) t/t1404-update-ref-errors.sh | 71 1 file changed, 71 insertions(+) diff --git a/t/t1404-update-ref-errors.sh b/t/t1404-update-ref-errors.sh index c34ece48f5..752f83c377 100755 --- a/t/t1404-update-ref-errors.sh +++ b/t/t1404-update-ref-errors.sh @@ -404,4 +404,75 @@ test_expect_success 'broken reference blocks indirect create' ' test_cmp expected output.err ' +test_expect_failure 'no bogus intermediate values during delete' ' + prefix=refs/slow-transaction && + # Set up a reference with differing loose and packed versions: + git update-ref $prefix/foo $C && + git pack-refs --all && + git update-ref $prefix/foo $D && + git for-each-ref $prefix >unchanged && + # Now try to update the reference, but hold the `packed-refs` lock + # for a while to see what happens while the process is blocked: + : >.git/packed-refs.lock && + test_when_finished "rm -f .git/packed-refs.lock" && + { + sleep 1 && + rm -f .git/packed-refs.lock & + } && + pid1=$! && + { + # Note: the following command is intentionally run in the + # background. We extend the timeout so that `update-ref` + # tries to acquire the `packed-refs` lock longer than it + # takes the background process above to delete it: + git -c core.packedrefstimeout=2000 update-ref -d $prefix/foo & + } && + pid2=$! && + ok=true && + while kill -0 $pid2 2>/dev/null + do + sha1=$(git rev-parse --verify --quiet $prefix/foo || echo undefined) && + case "$sha1" in + $D) + # This is OK; it just means that nothing has happened yet. + : ;; + undefined) + # This is OK; it means the deletion was successful. + : ;; + $C) + # This value should never be seen. Probably the loose + # reference has been deleted but the packed reference + # is still there: + echo "$prefix/foo incorrectly observed to be C" && + break + ;; + *) + # WTF? + echo "$prefix/foo unexpected value observed: $sha1" && + break + ;; + esac + done >out && + wait $pid1 && + wait $pid2 && + test_must_be_empty out && + test_must_fail git rev-parse --verify --quiet $prefix/foo +' + +test_expect_failure 'delete fails cleanly if packed-refs file is locked' ' + prefix=refs/locked-packed-refs && + # Set up a reference with differing loose and packed versions: + git update-ref $prefix/foo $C && + git pack-refs --all && + git update-ref $prefix/foo $D && + git for-each-ref $prefix >unchanged && + # Now try to delete it while the `packed-refs` lock is held: + : >.git/packed-refs.lock && + test_when_finished "rm -f .git/packed-refs.lock" && + test_must_fail git update-ref -d $prefix/foo >out 2>err && + git for-each-ref $prefix >actual && + grep "Unable to create $Q.*packed-refs.lock$Q: File exists" err && + test_cmp unchanged actual +' + test_done -- 2.14.1
[PATCH 00/10] Implement transactions for the packed ref store
This should be the second-to-last patch series in the quest for mmapped packed-refs. Before this patch series, there is still more coupling than necessary between `files_ref_store` and `packed_ref_store`: * `files_ref_store` adds packed references (e.g., during `git clone` and `git pack-refs`) by inserting them into the `packed_ref_store`'s internal `ref_cache`, then calling `commit_packed_refs()`. But `files_ref_store` shouldn't even *know* that `packed_ref_store` uses a `ref_cache`, let alone muck about in it. * `files_ref_store` deletes packed references by calling `repack_without_refs()`. Instead, implement reference transactions and `delete_refs()` for `packed_ref_store`, and change `files_ref_store` to use these standard methods rather than internal `packed_ref_store` methods. This patch series finishes up the encapsulation of `packed_ref_store`. At the end of the series, the outside world only interacts with it via the refs API plus a couple of locking-related functions. That will make it easy for the next patch series to change the internal workings of `packed_ref_store` without affecting `files_ref_store`. Along the way, we fix some longstanding problems with reference updates. Quoting from patch [08/10]: First, the old code didn't obtain the `packed-refs` lock until `files_transaction_finish()`. This means that a failure to acquire the `packed-refs` lock (e.g., due to contention with another process) wasn't detected until it was too late (problems like this are supposed to be detected in the "prepare" phase). The new code acquires the `packed-refs` lock in `files_transaction_prepare()`, the same stage of the processing when the loose reference locks are being acquired, removing another reason why the "prepare" phase might succeed and the "finish" phase might nevertheless fail. Second, the old code deleted the loose version of a reference before deleting any packed version of the same reference. This left a moment when another process might think that the packed version of the reference is current, which is incorrect. (Even worse, the packed version of the reference can be arbitrarily old, and might even point at an object that has since been garbage-collected.) Third, if a reference deletion fails to acquire the `packed-refs` lock altogether, then the old code might leave the repository in the incorrect state (possibly corrupt) described in the previous paragraph. Now we activate the new "packed-refs" file (sans any references that are being deleted) *before* deleting the corresponding loose references. But we hold the "packed-refs" lock until after the loose references have been finalized, thus preventing a simultaneous "pack-refs" process from packing the loose version of the reference in the time gap, which would otherwise defeat our attempt to delete it. This patch series is also available as branch `packed-ref-transactions` in my fork on GitHub [1]. A draft of the patch series to change `packed_ref_store` to use mmap and get rid of its `ref_cache` is available as branch `mmap-packed-refs` from the same source. Michael [1] https://github.com/mhagger/git Michael Haggerty (10): packed-backend: don't adjust the reference count on lock/unlock struct ref_transaction: add a place for backends to store data packed_ref_store: implement reference transactions packed_delete_refs(): implement method files_pack_refs(): use a reference transaction to write packed refs files_initial_transaction_commit(): use a transaction for packed refs t1404: demonstrate two problems with reference transactions files_ref_store: use a transaction to update packed refs packed-backend: rip out some now-unused code files_transaction_finish(): delete reflogs before references refs/files-backend.c | 200 ++- refs/packed-backend.c| 457 +-- refs/packed-backend.h| 17 +- refs/refs-internal.h | 1 + t/t1404-update-ref-errors.sh | 71 +++ 5 files changed, 534 insertions(+), 212 deletions(-) -- 2.14.1
Re: [PATCH 2/3] merge-recursive: remove return value from get_files_dirs
On Mon, Aug 28, 2017 at 06:45:29PM -0400, Ben Peart wrote: > > Since the debug output has been removed and the caller isn't > > checking the return value there is no reason to keep calulating > > and returning a value. > > > > Did a quick search and you're right, nothing is using the return value. No > reason to spend time calculating it. Looks good. This is actually one we can reasonably trust the compiler to check for us. Though of course it doesn't hurt to look at the callers and make sure that it's sane that they do not care about this return value (I don't know merge-recursive.c very well, but the current return value does seem kind of useless). -Peff
Re: [PATCH 2/3] merge-recursive: remove return value from get_files_dirs
On Mon, Aug 28, 2017 at 02:28:28PM -0600, Kevin Willford wrote: > The return value of the get_files_dirs call is never being used. > Looking at the history of the file and it was originally only > being used for debug output statements. Also when > read_tree_recursive return value is non zero it is changed to > zero. This leads me to believe that it doesn't matter if > read_tree_recursive gets an error. Or that the function is buggy. :) I'm tempted to say that we should probably die() when read_tree_recursive fails. This should only happen if we fail to parse the tree, or if our callback (save_files_dirs here) returns failure, and the latter looks like it never happens. > Since the debug output has been removed and the caller isn't > checking the return value there is no reason to keep calulating > and returning a value. Agreed, and I'm happy to see dead code go. Minor nit: s/calulating/calculating/ in your commit message. -Peff
Re: [PATCH 1/3] merge-recursive: fix memory leak
On Mon, Aug 28, 2017 at 02:28:27PM -0600, Kevin Willford wrote: > In merge_trees if process_renames or process_entry returns less > than zero, the method will just return and not free re_merge, > re_head, or entries. > > This change cleans up the allocated variables before returning > to the caller. Good catch. I suspect this function could stand to be refactored a bit. For instance, pulling those inner bits of the conditional into a helper would let us do: re_merge = get_renames(...); ... other setup ... clean = our_new_helper(re_merge, ...); string_clear(re_merge); ... other cleanup ... if (clean < 0) return clean; without having to resort to a goto. But certainly I don't mind this much more minimal change, which fixes the actual functionality problem. -Peff