Hi Jonathan,

On Wed, 22 Mar 2017, Jonathan Nieder wrote:

> Johannes Schindelin wrote:
> 
> > As to the default of seriously slowing down all SHA-1 computations:
> > since you made that the default, at compile time, with no way to turn
> > on the faster computation, this will have a major, negative impact.
> > Are you really, really sure you want to do that?
> >
> > I thought that it was obvious that we would have at least a runtime
> > option to lessen the load.
> 
> It's not obvious to me.  I agree that the DC_SHA1 case can be sped up,
> e.g. by turning off the collision detection for sha1 calculations that
> are not part of fetching, receiving a push, or running fsck.

And in those cases, using OpenSSL instead is *even* faster.

> To be clear, are you saying that this is a bad compile-time default
> because distributors are going to leave it and end-users will end up
> with a bad experience?  Or are you saying distributors have no good
> alternative to choose at compile time?  Or something else?

What I am saying is that this should be a more fine-grained, runtime knob.

If I write out an index, I should not suffer the slowdown from detecting
collisions. Because I implicitly trust myself and everything that I added
(and everything that was checked before already). This may not matter with
small projects. But we know a couple of real-world scenarios where this
matters.

Imagine for example the insane repository described by my colleague Saeed
Noursalehi at GitMerge. It is *ginormous*.

The index is 300MB. If you have to experience a sudden drop in performance
of `git add`, even by "only" 30%, relative to OpenSSL, it is very
noticeable. It is painful.

That is the reason why we spent considerable time trying to enhance
performance of SHA-1 hashing even by as little as a couple of percentage
points here and there. The accumulated wins are noticeable, and
I assume that those wins are completely annihilated by the heavy-handed
switch to detect collisions always.

It gets even worse when it comes to fetching, let alone cloning.

And please note that the gigantic repository I mentioned above is a
company-internal one, i.e. the servers/repository are implicitly trusted.
Having to pay the price of a full clone going from 12+ hours to even only
15+ hours *hurts*. Particularly when that price is paid for no value in
return at all: the server *already* will have checked for crafted objects.

I could imagine that this problem could be addressed to everybody's
satisfaction by introducing a tristate config setting where the collision
detection can be switched on & off, and then also to, say, "external" i.e.
collision detection would be switched on whenever objects are retrieved
from somewhere else than the local repository (e.g. git-receive-pack).

If fetching or cloning from a trusted source, this config setting could be
switched off on the command-line, otherwise left at "external".

And by "switching collision detection off", I of course refer to *not*
using SHA1DC's routines at all, but what would have been used originally,
in Git for Windows' case: (hardware-accelerated) OpenSSL.

Did I manage to clarify the problem?
Johannes

Reply via email to