Re: [PATCH v2 1/2] read-cache: skip index SHA verification

2017-03-28 Thread Jeff King
On Tue, Mar 28, 2017 at 11:27:19AM -0400, Jeff Hostetler wrote:

> > Hrm, there shouldn't be any dependency of the config on the index (and
> > there are a handful of options which impact the index already). Did you
> > try it and run into problems?
> 
> Yeah, I tried adding a new "core.verifyindex" property and the
> corresponding global variable.  But read_index() and verify_hdr()
> was being called BEFORE the config was loaded.  And it wasn't clear
> how best to solve that.
> 
> The issue was in "git status" where cmd_status() called
> status_init_config() which called gitmodules_config() before
> git_config().  but gitmodules_config() called read_index(),
> so my settings weren't loaded yet in verify_hdr().

Ugh, yeah, the callback-oriented interface suffers from these kind of
dependency cycles. You can fix it by doing a limited "basic config that
should always be loaded" git_config() call before anything else, and
then following up with the application-level config.

For something low-level that should _always_ be respected, even in
plumbing programs, I think we're better off lazy-loading the config
inside the function. The configset cache makes them more or less free.

I.e., something like:

diff --git a/read-cache.c b/read-cache.c
index e44775182..89bbf8d1e 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1376,17 +1376,23 @@ static int verify_hdr(struct cache_header *hdr, 
unsigned long size)
git_SHA_CTX c;
unsigned char sha1[20];
int hdr_version;
+   int do_checksum = 0;
 
if (hdr->hdr_signature != htonl(CACHE_SIGNATURE))
return error("bad signature");
hdr_version = ntohl(hdr->hdr_version);
if (hdr_version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < hdr_version)
return error("bad index version %d", hdr_version);
-   git_SHA1_Init();
-   git_SHA1_Update(, hdr, size - 20);
-   git_SHA1_Final(sha1, );
-   if (hashcmp(sha1, (unsigned char *)hdr + size - 20))
-   return error("bad index file sha1 signature");
+
+   git_config_get_bool("core.checksumindex", _checksum);
+   if (do_checksum) {
+   git_SHA1_Init();
+   git_SHA1_Update(, hdr, size - 20);
+   git_SHA1_Final(sha1, );
+   if (hashcmp(sha1, (unsigned char *)hdr + size - 20))
+   return error("bad index file sha1 signature");
+   }
+
return 0;
 }

-Peff


Re: [PATCH v2 1/2] read-cache: skip index SHA verification

2017-03-28 Thread Jeff Hostetler



On 3/27/2017 6:44 PM, Jeff King wrote:

On Mon, Mar 27, 2017 at 09:09:38PM +, g...@jeffhostetler.com wrote:


From: Jeff Hostetler 

Teach git to skip verification of the index SHA in read_index().

This is a performance optimization.  The index file SHA verification
can be considered an ancient relic from the early days of git and only
useful for detecting disk corruption.  For small repositories, this
SHA calculation is not that significant, but for gigantic repositories
this calculation adds significant time to every command.

I added a global "skip_verify_index" variable to control this and
allow it to be tested.

I did not create a config setting for this because of chicken-n-egg
problems with the loading the config and the index.


Hrm, there shouldn't be any dependency of the config on the index (and
there are a handful of options which impact the index already). Did you
try it and run into problems?


Yeah, I tried adding a new "core.verifyindex" property and the
corresponding global variable.  But read_index() and verify_hdr()
was being called BEFORE the config was loaded.  And it wasn't clear
how best to solve that.

The issue was in "git status" where cmd_status() called
status_init_config() which called gitmodules_config() before
git_config().  but gitmodules_config() called read_index(),
so my settings weren't loaded yet in verify_hdr().

I tried switching the order in status_init_config(), but
that caused errors in t7508 with submodules not being handled
properly.

https://github.com/jeffhostetler/git/commits/upstream/core_verify_index

At this point I decided that it wasn't that important to have
this config setting, since we'll probably default it to be faster
and be done with it.



In general, I'd much rather see us either:

  1. Rip the code out entirely if it is not meant to be configurable,
 and cannot be triggered by the actual git binary.

or

  2. Make it configurable, even if most people wouldn't use it. And then
 have a test to exercise it using a git command (unlike the one-off
 test helper, which isn't run at all).

-Peff



I'm OK with (1) if everyone else is.

Jeff



Re: [PATCH v2 1/2] read-cache: skip index SHA verification

2017-03-27 Thread Jeff King
On Mon, Mar 27, 2017 at 04:32:10PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > Hrm, there shouldn't be any dependency of the config on the index (and
> > there are a handful of options which impact the index already). Did you
> > try it and run into problems?
> >
> > In general, I'd much rather see us either:
> >
> >   1. Rip the code out entirely if it is not meant to be configurable,
> >  and cannot be triggered by the actual git binary.
> >
> > or
> >
> >   2. Make it configurable, even if most people wouldn't use it. And then
> >  have a test to exercise it using a git command (unlike the one-off
> >  test helper, which isn't run at all).
> 
> Agreed with this, leaning toward (1).
> 
> If "git fsck" verifies the .git/index file then I don't see any need
> for other commands to.

Yeah, code that _can_ be run but almost nobody does is possibly worse
than code that can't be run. :)

I agree that it would make sense for fsck to check it, though (just like
it checks the actual pack trailer checksums, even though normal
operations do not).

-Peff


Re: [PATCH v2 1/2] read-cache: skip index SHA verification

2017-03-27 Thread Jonathan Nieder
Jeff King wrote:

> Hrm, there shouldn't be any dependency of the config on the index (and
> there are a handful of options which impact the index already). Did you
> try it and run into problems?
>
> In general, I'd much rather see us either:
>
>   1. Rip the code out entirely if it is not meant to be configurable,
>  and cannot be triggered by the actual git binary.
>
> or
>
>   2. Make it configurable, even if most people wouldn't use it. And then
>  have a test to exercise it using a git command (unlike the one-off
>  test helper, which isn't run at all).

Agreed with this, leaning toward (1).

If "git fsck" verifies the .git/index file then I don't see any need
for other commands to.

Thanks and hope that helps,
Jonathan


Re: [PATCH v2 1/2] read-cache: skip index SHA verification

2017-03-27 Thread Jeff King
On Mon, Mar 27, 2017 at 09:09:38PM +, g...@jeffhostetler.com wrote:

> From: Jeff Hostetler 
> 
> Teach git to skip verification of the index SHA in read_index().
> 
> This is a performance optimization.  The index file SHA verification
> can be considered an ancient relic from the early days of git and only
> useful for detecting disk corruption.  For small repositories, this
> SHA calculation is not that significant, but for gigantic repositories
> this calculation adds significant time to every command.
> 
> I added a global "skip_verify_index" variable to control this and
> allow it to be tested.
> 
> I did not create a config setting for this because of chicken-n-egg
> problems with the loading the config and the index.

Hrm, there shouldn't be any dependency of the config on the index (and
there are a handful of options which impact the index already). Did you
try it and run into problems?

In general, I'd much rather see us either:

  1. Rip the code out entirely if it is not meant to be configurable,
 and cannot be triggered by the actual git binary.

or

  2. Make it configurable, even if most people wouldn't use it. And then
 have a test to exercise it using a git command (unlike the one-off
 test helper, which isn't run at all).

-Peff