Re: [PATCH v2 1/2] read-cache: skip index SHA verification
On Tue, Mar 28, 2017 at 11:27:19AM -0400, Jeff Hostetler wrote: > > Hrm, there shouldn't be any dependency of the config on the index (and > > there are a handful of options which impact the index already). Did you > > try it and run into problems? > > Yeah, I tried adding a new "core.verifyindex" property and the > corresponding global variable. But read_index() and verify_hdr() > was being called BEFORE the config was loaded. And it wasn't clear > how best to solve that. > > The issue was in "git status" where cmd_status() called > status_init_config() which called gitmodules_config() before > git_config(). but gitmodules_config() called read_index(), > so my settings weren't loaded yet in verify_hdr(). Ugh, yeah, the callback-oriented interface suffers from these kind of dependency cycles. You can fix it by doing a limited "basic config that should always be loaded" git_config() call before anything else, and then following up with the application-level config. For something low-level that should _always_ be respected, even in plumbing programs, I think we're better off lazy-loading the config inside the function. The configset cache makes them more or less free. I.e., something like: diff --git a/read-cache.c b/read-cache.c index e44775182..89bbf8d1e 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1376,17 +1376,23 @@ static int verify_hdr(struct cache_header *hdr, unsigned long size) git_SHA_CTX c; unsigned char sha1[20]; int hdr_version; + int do_checksum = 0; if (hdr->hdr_signature != htonl(CACHE_SIGNATURE)) return error("bad signature"); hdr_version = ntohl(hdr->hdr_version); if (hdr_version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < hdr_version) return error("bad index version %d", hdr_version); - git_SHA1_Init(); - git_SHA1_Update(, hdr, size - 20); - git_SHA1_Final(sha1, ); - if (hashcmp(sha1, (unsigned char *)hdr + size - 20)) - return error("bad index file sha1 signature"); + + git_config_get_bool("core.checksumindex", _checksum); + if (do_checksum) { + git_SHA1_Init(); + git_SHA1_Update(, hdr, size - 20); + git_SHA1_Final(sha1, ); + if (hashcmp(sha1, (unsigned char *)hdr + size - 20)) + return error("bad index file sha1 signature"); + } + return 0; } -Peff
Re: [PATCH v2 1/2] read-cache: skip index SHA verification
On 3/27/2017 6:44 PM, Jeff King wrote: On Mon, Mar 27, 2017 at 09:09:38PM +, g...@jeffhostetler.com wrote: From: Jeff HostetlerTeach git to skip verification of the index SHA in read_index(). This is a performance optimization. The index file SHA verification can be considered an ancient relic from the early days of git and only useful for detecting disk corruption. For small repositories, this SHA calculation is not that significant, but for gigantic repositories this calculation adds significant time to every command. I added a global "skip_verify_index" variable to control this and allow it to be tested. I did not create a config setting for this because of chicken-n-egg problems with the loading the config and the index. Hrm, there shouldn't be any dependency of the config on the index (and there are a handful of options which impact the index already). Did you try it and run into problems? Yeah, I tried adding a new "core.verifyindex" property and the corresponding global variable. But read_index() and verify_hdr() was being called BEFORE the config was loaded. And it wasn't clear how best to solve that. The issue was in "git status" where cmd_status() called status_init_config() which called gitmodules_config() before git_config(). but gitmodules_config() called read_index(), so my settings weren't loaded yet in verify_hdr(). I tried switching the order in status_init_config(), but that caused errors in t7508 with submodules not being handled properly. https://github.com/jeffhostetler/git/commits/upstream/core_verify_index At this point I decided that it wasn't that important to have this config setting, since we'll probably default it to be faster and be done with it. In general, I'd much rather see us either: 1. Rip the code out entirely if it is not meant to be configurable, and cannot be triggered by the actual git binary. or 2. Make it configurable, even if most people wouldn't use it. And then have a test to exercise it using a git command (unlike the one-off test helper, which isn't run at all). -Peff I'm OK with (1) if everyone else is. Jeff
Re: [PATCH v2 1/2] read-cache: skip index SHA verification
On Mon, Mar 27, 2017 at 04:32:10PM -0700, Jonathan Nieder wrote: > Jeff King wrote: > > > Hrm, there shouldn't be any dependency of the config on the index (and > > there are a handful of options which impact the index already). Did you > > try it and run into problems? > > > > In general, I'd much rather see us either: > > > > 1. Rip the code out entirely if it is not meant to be configurable, > > and cannot be triggered by the actual git binary. > > > > or > > > > 2. Make it configurable, even if most people wouldn't use it. And then > > have a test to exercise it using a git command (unlike the one-off > > test helper, which isn't run at all). > > Agreed with this, leaning toward (1). > > If "git fsck" verifies the .git/index file then I don't see any need > for other commands to. Yeah, code that _can_ be run but almost nobody does is possibly worse than code that can't be run. :) I agree that it would make sense for fsck to check it, though (just like it checks the actual pack trailer checksums, even though normal operations do not). -Peff
Re: [PATCH v2 1/2] read-cache: skip index SHA verification
Jeff King wrote: > Hrm, there shouldn't be any dependency of the config on the index (and > there are a handful of options which impact the index already). Did you > try it and run into problems? > > In general, I'd much rather see us either: > > 1. Rip the code out entirely if it is not meant to be configurable, > and cannot be triggered by the actual git binary. > > or > > 2. Make it configurable, even if most people wouldn't use it. And then > have a test to exercise it using a git command (unlike the one-off > test helper, which isn't run at all). Agreed with this, leaning toward (1). If "git fsck" verifies the .git/index file then I don't see any need for other commands to. Thanks and hope that helps, Jonathan
Re: [PATCH v2 1/2] read-cache: skip index SHA verification
On Mon, Mar 27, 2017 at 09:09:38PM +, g...@jeffhostetler.com wrote: > From: Jeff Hostetler> > Teach git to skip verification of the index SHA in read_index(). > > This is a performance optimization. The index file SHA verification > can be considered an ancient relic from the early days of git and only > useful for detecting disk corruption. For small repositories, this > SHA calculation is not that significant, but for gigantic repositories > this calculation adds significant time to every command. > > I added a global "skip_verify_index" variable to control this and > allow it to be tested. > > I did not create a config setting for this because of chicken-n-egg > problems with the loading the config and the index. Hrm, there shouldn't be any dependency of the config on the index (and there are a handful of options which impact the index already). Did you try it and run into problems? In general, I'd much rather see us either: 1. Rip the code out entirely if it is not meant to be configurable, and cannot be triggered by the actual git binary. or 2. Make it configurable, even if most people wouldn't use it. And then have a test to exercise it using a git command (unlike the one-off test helper, which isn't run at all). -Peff