This is the cleaned-up version of the commit caching patches I mentioned

The basic idea is to generate a cache file that sits alongside a
packfile and contains the timestamp, tree, and parents in a more compact
and easy-to-access format.

The timings from this one are roughly similar to what I posted earlier.
Unlike the earlier version, this one keeps the data for a single commit
together for better cache locality (though I don't think it made a big
difference in my tests, since my cold-cache timing test ends up touching
every commit anyway).  The short of it is that for an extra 31M of disk
space (~4%), I get a warm-cache speedup for "git rev-list --all" of
~4.2s to ~0.66s.

The big thing it does not (yet) do is use offsets to reference sha1s, as
Shawn suggested.  This would potentially drop the on-disk size from 84
bytes to 16 bytes per commit (or about 6M total for linux.git).

Coupled with using compression level 0 for trees (which do not compress
well at all, and yield only a 2% increase in size when left
uncompressed), my "git rev-list --objects --all" time drops from ~40s to
~25s. Perf reveals that we're spending most of the remaining time in
lookup_object. I've spent a fair bit of time trying to optimize that,
but with no luck; I think it's fairly close to optimal. The problem is
just that we call it a very large number of times, since it is the
mechanism by which we recognize that we have already processed each

  [1/6]: csum-file: make sha1write const-correct
  [2/6]: strbuf: add string-chomping functions
  [3/6]: introduce pack metadata cache files
  [4/6]: introduce a commit metapack
  [5/6]: add git-metapack command
  [6/6]: commit: look up commit info in metapack

To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to
More majordomo info at

Reply via email to