Hello friends and enemies from the lovevely Git Mailing list. I bring to you a patch series that implement a quite interesting performance optimization: the removal of the "Counting Objects" phase during `pack-objects` by using a pre-computed bitmap to find the reachable objects in the packfile.
As you probably know, Shawn Pearce designed this approach a few months ago and implemented it for JGit, with very exciting results. This is a not-so-straightforward port of his original design: The general approach is the same, but unfortunately we were not able to re-use JGit's original on-disk format for the `.bitmap` files. There is a full technical spec for the new format (v2) in patch 09, including benchmarks and rationale for the new design. The gist of it is that JGit's original format is not `mmap`able (JGit tends to not mmap anything), and that becomes very costly in practice with `upload-pack`, which spawns a new process for every upload. The header and metadata for both formats are however compatible, so it should be trivial to update JGit to read/write this format too. I intend to do this on the coming weeks, and I also hope that the v2 implementation will be slightly faster than the actual, even with the shortcomings of the JVM. The patch series, although massive, is rather straightforward. Most of the patches are isolated refactorings that enable access to a few functions that were previously hidden (re. packfile data). These functions are needed for reading and writing the bitmap indexes. Patch 03 is worth noting because it implements a performance optimization for `pack-objects` which isn't particularly good in normal invocations (~10% speed up) but that will show great benefits in later patches when it comes to writing the bitmap indexes. Patch 10 is the core of the series, implementing the actual loading of bitmap indexes and optimizing the Counting Objects phase of `pack-objects`. Like with every other patch that offers performance improvements, sample benchmarks are provided (spoiler: they are pretty fucking cool). Patch 11 and 16 are samples of using the new Bitmap traversal API to speed up other parts of Git (`rev-list --objects` and `rev-list --count`, respectively). Patch 12, 13 and 15 implement the actual writing of bitmap indexes. Like JGit, patch 12 enables writing a bitmap index as part of the `pack-objects` process (and hence as part of a normal `gc` run). On top of that, I implemented a new plumbing command in patch 15 that allows to write bitmap indexes for already-existing packfiles. I'd love your feedback on the design and implementation of this feature. I deem it rather stable, as we've been testing it on production on the world's largest Git host (Git Hub Dot Com The Web Site) with good results, so I'd love it to have it upstreamed on Core Git. Strawberry kisses, vmg Jeff King (1): list-objects: mark tree as unparsed when we free its buffer Vicent Marti (15): sha1_file: refactor into `find_pack_object_pos` pack-objects: use a faster hash table pack-objects: make `pack_name_hash` global revision: allow setting custom limiter function sha1_file: export `git_open_noatime` compat: add endinanness helpers ewah: compressed bitmap implementation documentation: add documentation for the bitmap format pack-objects: use bitmaps when packing objects rev-list: add bitmap mode to speed up lists pack-objects: implement bitmap writing repack: consider bitmaps when performing repacks sha1_file: implement `nth_packed_object_info` write-bitmap: implement new git command to write bitmaps rev-list: Optimize --count using bitmaps too Documentation/technical/bitmap-format.txt | 235 ++++++++ Makefile | 11 + builtin.h | 1 + builtin/pack-objects.c | 362 +++++++----- builtin/pack-objects.h | 33 ++ builtin/rev-list.c | 35 +- builtin/write-bitmap.c | 256 +++++++++ cache.h | 5 + ewah/bitmap.c | 229 ++++++++ ewah/ewah_bitmap.c | 703 ++++++++++++++++++++++++ ewah/ewah_io.c | 199 +++++++ ewah/ewah_rlw.c | 124 +++++ ewah/ewok.h | 194 +++++++ ewah/ewok_rlw.h | 114 ++++ git-compat-util.h | 28 + git-repack.sh | 10 +- git.c | 1 + khash.h | 329 +++++++++++ list-objects.c | 1 + pack-bitmap-write.c | 520 ++++++++++++++++++ pack-bitmap.c | 855 +++++++++++++++++++++++++++++ pack-bitmap.h | 64 +++ pack-write.c | 2 + revision.c | 5 + revision.h | 2 + sha1_file.c | 57 +- 26 files changed, 4212 insertions(+), 163 deletions(-) create mode 100644 Documentation/technical/bitmap-format.txt create mode 100644 builtin/pack-objects.h create mode 100644 builtin/write-bitmap.c create mode 100644 ewah/bitmap.c create mode 100644 ewah/ewah_bitmap.c create mode 100644 ewah/ewah_io.c create mode 100644 ewah/ewah_rlw.c create mode 100644 ewah/ewok.h create mode 100644 ewah/ewok_rlw.h create mode 100644 khash.h create mode 100644 pack-bitmap-write.c create mode 100644 pack-bitmap.c create mode 100644 pack-bitmap.h -- 188.8.131.52 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html