On Tue, Jan 29, 2013 at 1:19 PM, Jeff King <p...@peff.net> wrote:
> On Tue, Jan 29, 2013 at 07:58:01AM -0800, Junio C Hamano wrote:
>> The point is not about space. Disk is cheap, and it is not making
>> it any worse than what happens to your target audience, that is a
>> fetch-only repository with only "gc --auto" in it, where nobody
>> passes "-f" to "repack" to cause recomputation of delta.
>> What I was trying to seek was a way to reduce the runtime penalty we
>> pay every time we run git in such a repository.
>> - Object look-up cost will become log2(50*n) from 50*log2(n), which
>> is about 50/log2(50) improvement;
> Yes and no. Our heuristic is to look at the last-used pack for an
> object. So assuming we have locality of requests, we should quite often
> get "lucky" and find the object in the first log2 search. Even if we
> don't assume locality, a situation with one large pack and a few small
> packs will have the large one as "last used" more often than the others,
> and it will also have the looked-for object more often than the others
Opening all of those files does impact performance. It depends on how
slow your open(2) syscall is. I know on Mac OS X that its not the
fastest function we get from the C library. Performing ~40 opens to
look through the most recent pack files and finally find the "real"
pack that contains that tag you asked `git show` for isn't that quick.
Some of us also use Git on filesystems that are network based, and
slow compared to local disk Linux ext2/3/4 with gobs of free RAM.
> So I can see how it is something we could potentially optimize, but I
> could also see it being surprisingly not a big deal. I'd be very
> interested to see real measurements, even of something as simple as a
> "master index" which can reference multiple packfiles.
I actually tried this many many years ago. There are threads in the
archive about it. Its slower. We ruled it out.
>> - System resource cost we incur by having to keep 50 file
>> descriptors open and maintaining 50 mmap windows will reduce by
>> 50 fold.
> I wonder how measurable that is (and if it matters on Linux versus less
> efficient platforms).
It does matter. We know it has a negative impact on JGit even on Linux
for example. You don't want 300 packs in a repository. 50 might be
tolerable. 300 is not.
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html