On Fri, Oct 14, 2016 at 10:39:52AM -0700, Junio C Hamano wrote:

> Jeff King <p...@peff.net> writes:
> > So it's certainly better. But 7500 packs is just silly, and squeezing
> > out ~400ms there is hardly worth it. If you repack this same case into a
> > single pack, the command drops to 5ms. So yes, there's close to an order
> > of magnitude speedup here, but you get that _and_ another order of
> > magnitude just by repacking.
> "7500 is silly" equally applies to the "quick" (and sloppy, if I am
> reading your "Failing in this direction doesn't make me feel great."
> correctly) approach, I think, which argues for not taking either
> change X-<.

I wouldn't quite agree that they're the same. 7500 packs is silly
because a bunch of _other_ things are going to get equally or more slow
before the prepare_packed_git() slowdown is noticeable. And it's in your
power to clean up, and we should encourage users to do so.

Whereas having a bunch of unfetched tags is a state that may linger
indefinitely, and be outside the user's control.

I _am_ open to the argument that calling reprepare_packed_git() over and
over doesn't really matter much if you have a sane number of packs. If
you tweak my perf test like so:

diff --git a/t/perf/p5550-fetch-tags.sh b/t/perf/p5550-fetch-tags.sh
index a5dc39f..7e7ae24 100755
--- a/t/perf/p5550-fetch-tags.sh
+++ b/t/perf/p5550-fetch-tags.sh
@@ -86,7 +86,7 @@ test_expect_success 'create child packs' '
                cd child &&
                git config gc.auto 0 &&
                git config gc.autopacklimit 0 &&
-               create_packs 500
+               create_packs 10

you get:

Test            origin            quick                 
5550.4: fetch   0.06(0.02+0.02)   0.02(0.01+0.00) -66.7%

Still an impressive speedup as a percentage, but negligible in absolute
terms. But that's on a local filesystem on a Linux machine. I'd worry
much more about a system with a slow readdir(), e.g., due to NFS.
Somebody's real-world NFS case[1] was what prompted us to do 0eeb077
(index-pack: avoid excessive re-reading of pack directory, 2015-06-09).

It looks like I _did_ look into optimizing this into a single stat()
call in the thread at [1]. I completely forgot about that. I did find
there that naively using stat_validity() on a directory is racy, though
I wonder if we could do something clever with gettimeofday() instead.
IOW, the patches there only bothered to re-read when they saw the mtime
on the directory jump, which suffers from 1-second precision problems.
But if we instead compared the mtime to the current time, we could err
in favor of re-reading the packs, and get false positives for at most 1


> I agree that the fallout from the inaccuracy of "quick" approach is
> probably acceptable and the next "fetch" will correct it anyway, so
> let's do the "quick but inaccurate" for now and perhaps cook it in
> 'next' for a bit longer than other topics?

I doubt that cooking in 'next' for longer will turn up anything useful.
The case we care about is the race between a repack and a fetch. We
lived with the "quick" version of has_sha1_file() everywhere for 8
years. I only noticed the race on a hosting cluster which puts through
millions of operations per day (I could in theory test the patch on that
same cluster, but we actually don't do very many fetches).


Reply via email to