On Tue, Jan 29, 2013 at 07:58:01AM -0800, Junio C Hamano wrote:
> The point is not about space. Disk is cheap, and it is not making
> it any worse than what happens to your target audience, that is a
> fetch-only repository with only "gc --auto" in it, where nobody
> passes "-f" to "repack" to cause recomputation of delta.
> What I was trying to seek was a way to reduce the runtime penalty we
> pay every time we run git in such a repository.
> - Object look-up cost will become log2(50*n) from 50*log2(n), which
> is about 50/log2(50) improvement;
Yes and no. Our heuristic is to look at the last-used pack for an
object. So assuming we have locality of requests, we should quite often
get "lucky" and find the object in the first log2 search. Even if we
don't assume locality, a situation with one large pack and a few small
packs will have the large one as "last used" more often than the others,
and it will also have the looked-for object more often than the others
So I can see how it is something we could potentially optimize, but I
could also see it being surprisingly not a big deal. I'd be very
interested to see real measurements, even of something as simple as a
"master index" which can reference multiple packfiles.
> - System resource cost we incur by having to keep 50 file
> descriptors open and maintaining 50 mmap windows will reduce by
> 50 fold.
I wonder how measurable that is (and if it matters on Linux versus less
> > I would be interested to see the timing on how quick it is compared to a
> > real repack,...
> Yes, that is what I meant by "wonder if we would be helped by" ;-)
There is only one way to find out... :)
Maybe I am blessed with nice machines, but I have mostly found the
repack process not to be that big a deal these days (especially with
threaded delta compression).
> > But how do these somewhat mediocre concatenated packs get turned into
> > real packs?
> How do they get processed in a fetch-only repositories that
> sometimes run "gc --auto" today? By runninng "repack -a -d -f"
> occasionally, perhaps?
Do we run "repack -adf" regularly? The usual "git gc" procedure will not
use "-f", and without that, we will not even consider making deltas
between objects that were formerly in different packs (but now are in
the same pack).
So you are avoiding doing medium-effort packs ("repack -ad") in favor of
doing potentially quick packs, but occasionally doing a big-effort pack
("repack -adf"). It may be reasonable advice to "repack -adf"
occasionally, but I suspect most people are not doing it regularly (if
only because "git gc" does not do it by default).
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html