This is the cleaned-up version of the commit caching patches I mentioned
The basic idea is to generate a cache file that sits alongside a
packfile and contains the timestamp, tree, and parents in a more compact
and easy-to-access format.
The timings from this one are roughly similar to what I posted earlier.
Unlike the earlier version, this one keeps the data for a single commit
together for better cache locality (though I don't think it made a big
difference in my tests, since my cold-cache timing test ends up touching
every commit anyway). The short of it is that for an extra 31M of disk
space (~4%), I get a warm-cache speedup for "git rev-list --all" of
~4.2s to ~0.66s.
The big thing it does not (yet) do is use offsets to reference sha1s, as
Shawn suggested. This would potentially drop the on-disk size from 84
bytes to 16 bytes per commit (or about 6M total for linux.git).
Coupled with using compression level 0 for trees (which do not compress
well at all, and yield only a 2% increase in size when left
uncompressed), my "git rev-list --objects --all" time drops from ~40s to
~25s. Perf reveals that we're spending most of the remaining time in
lookup_object. I've spent a fair bit of time trying to optimize that,
but with no luck; I think it's fairly close to optimal. The problem is
just that we call it a very large number of times, since it is the
mechanism by which we recognize that we have already processed each
[1/6]: csum-file: make sha1write const-correct
[2/6]: strbuf: add string-chomping functions
[3/6]: introduce pack metadata cache files
[4/6]: introduce a commit metapack
[5/6]: add git-metapack command
[6/6]: commit: look up commit info in metapack
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html