On Fri, Sep 06, 2019 at 01:10:46AM +0200, Stephan Beyer wrote:
> I took a quick glance at yours. I also noticed the issue you address in
> [PATCH 2/6], but I was unsure if this is the way to go (I'm only
> occasionally reading on this list). I would prefer your patch series,
> with maybe one exception...
Yeah, we've been slowly removing these old "unsigned char *" references
(see a7db4c193d98f for the latest round touching this same code -- I'm
actually surprised I missed this one back then, as that's when
packlist_find() got converted).
> The thing is: I had *exactly* the same commit like your [PATCH 6/6]
> (except for the commit message and for the number), but I dropped it.
> Why? Because I had the feeling (no particular instance though) that the
> second locate_object_entry_hash() for each insertion *can* indeed take
> "too much" time. Also, I was wondering, if the "found = 1" case should
> be catched as a BUG("should not happen") or something.
>
> I don't care much, though. The performance impact should probably be
> checked carefully.
I did measure it, like this:
# Do the traversal separately. This would make any difference in the
# pack-objects hash code stand out _more_, plus it makes it cheaper to
# run multiple trials.
git rev-list --objects --all >input
# Make sure stderr is redirected to avoid progress, which again would
# amplify any differences.
git pack-objects --stdout --delta-base-offset <input >/dev/null 2>&1
Running on linux.git, I got:
[before]
Benchmark #1: git pack-objects --stdout --delta-base-offset <input >/dev/null
2>&1
Time (mean ± σ): 26.225 s ± 0.233 s [User: 24.089 s, System: 4.867
s]
Range (min … max): 25.915 s … 26.555 s 10 runs
[after]
Benchmark #1: git pack-objects --stdout --delta-base-offset <input >/dev/null
2>&1
Time (mean ± σ): 26.129 s ± 0.170 s [User: 24.003 s, System: 4.958
s]
Range (min … max): 25.974 s … 26.570 s 10 runs
So actually faster after, though not statistically significant. ;)
The BUG() on "found==1" might be worth doing.
-Peff