On Tue, Feb 28, 2017 at 11:07:37AM -0800, Junio C Hamano wrote:

> Junio C Hamano <gits...@pobox.com> writes:
> 
> >>   [1/3]: add collision-detecting sha1 implementation
> >>   [2/3]: sha1dc: adjust header includes for git
> >>   [3/3]: Makefile: add USE_SHA1DC knob
> >
> > I was lazy so I fetched the above and then added this on top before
> > I start to play with it.
> >
> > -- >8 --
> > From: Junio C Hamano <gits...@pobox.com>
> > Date: Tue, 28 Feb 2017 10:39:25 -0800
> > Subject: [PATCH] sha1dc: resurrect LICENSE file
> 
> In a way similar to 8415558f55 ("sha1dc: avoid c99
> declaration-after-statement", 2017-02-24), we would want this on
> top.
> 
> -- >8 --
> Subject: sha1dc: avoid 'for' loop initial decl

Yeah, thanks, I had tweaked both that and the license thing locally but
had not pushed it out yet. Both are obvious improvements.

FWIW, I've been in touch with Dan Shumow, one of the authors, who has
been looking at whether we can speed up the sha1dc implementation, or
integrate the checks into the block-sha1 implementation.

Here are a few notes on the earlier timings I posted that came out in
our conversation:

  - the timings I showed earlier were actually openssl versus sha1dc.
    The block-sha1 timings fall somewhere in the middle:

      [running test-sha1 on a fresh linux.git packfile]
      1.347s openssl
      2.079s block-sha1
      4.983s sha1dc

      [index-pack --verify on a fresh git.git packfile]
       6.919s openssl
       9.003s block-sha1
      17.955s sha1dc

    Those are the operations that show off sha1 performance the most.
    The first one is really not even that interesting; it's raw
    sha1 performance. The second one is an actual operation users might
    notice (though not as --verify exactly, but as "index-pack --stdin"
    when receiving a fetch or a push).

    So there's room for improvement, but the gap between block-sha1
    and sha1dc is not quite as big as I showed earlier.

  - Dan timed the sha1dc implementation with and without the collision
    detection enabled. The sha1 implementation is only 1.33x slower than
    block-sha1 (for raw sha1 time). Adding in the detection makes it
    2.6x slower.

    So there's some potential gain from optimizing the sha1
    implementation, but ultimately we may be looking at a 2x slowdown to
    add in the collision detection.

    It doesn't need to happen for _every_ sha1 we compute, but the
    index-pack case is the one that almost certainly _does_ want it,
    because that's when we're admitting remote objects into the
    repository (ditto you'd probably want it for write_sha1_file(),
    since you could be applying a patch from an untrusted source).

-Peff

Reply via email to