I had a window of opportunity last week to hack intensely on Git, with
the following goals:
* Separate `ref_cache` out of `files_ref_cache`.
* Separate a new `packed_ref_cache` class out of `files_ref_cache`.
Change the latter to use an instance of the former for all of its
interactions with the `packed-refs` file.
* Mmap `packed-refs` files rather than reading-and-parsing.
* Use the mmapped version of the `packed-refs` file as the "cache"
rather than using a separate `ref_cache`.
* (And the main goal): Avoid reading and parsing the *whole
`packed-refs` file* (as we do now) every time any part of it is
needed. Instead, use binary search to find the reference and/or
range of references that we want, and parse the info out of the
mmapped image on the fly.
I've completed a draft of an epic 48-patch series implementing all of
the above points on my GitHub fork [1] as branch
`wip/mmap-packed-refs`. It dramatically speeds up performance and
reduces memory usage for some tasks in repositories with very many
packed references.
But the later parts of that series aren't completely polished yet, and
such a large patch series would be indigestible anyway, so here I
submit the first part...
This patch series extracts a `ref_cache` module out of
`files_ref_cache`, and goes some way to disentangling those two
modules, which until now were overly intimate with each other:
* Remove `verify_refname_available()` from the refs VTABLE, instead
implementing it in a generic way that uses only the usual refs API
to talk to the `ref_store`.
* Split `ref_cache`-related code into a new module,
`refs/ref-cache.{c,h}`. Encapsulate the data structure in a new
class, `struct ref_cache`.
* Change the lazy-filling mechanism of `ref_cache` to call back to its
backing `ref_store` via a callback function rather than calling
`read_loose_refs()` directly.
* Move the special handling of `refs/bisect/` from `ref_cache` to
`files_ref_store`.
* Make `cache_ref_iterator_begin()` smarter, and change external users
to iterate via this interface instead of using
`do_for_each_entry_in_dir()`.
Even after this patch series, the modules are still too intimate for
my taste, but I think this is a big step forward, and it is enough to
allow the other changes that I've been working on.
These patches depend on Duy's nd/files-backend-git-dir branch, v6 [2].
They are also available from my GitHub fork [1] as branch
`separate-ref-cache`.
Happily, this patch series actually removes a few more lines than it
adds, mostly thanks to the simpler `verify_refname_available()`
implementation.
Michael
[1] https://github.com/mhagger/git
[2] http://public-inbox.org/git/[email protected]/
Michael Haggerty (20):
get_ref_dir(): don't call read_loose_refs() for "refs/bisect"
refs_read_raw_ref(): new function
refs_ref_iterator_begin(): new function
refs_verify_refname_available(): implement once for all backends
refs_verify_refname_available(): use function in more places
Rename `add_ref()` to `add_ref_entry()`
Rename `find_ref()` to `find_ref_entry()`
Rename `remove_entry()` to `remove_entry_from_dir()`
refs: split `ref_cache` code into separate files
ref-cache: introduce a new type, ref_cache
refs: record the ref_store in ref_cache, not ref_dir
ref-cache: use a callback function to fill the cache
refs: handle "refs/bisect/" in `loose_fill_ref_dir()`
do_for_each_entry_in_dir(): eliminate `offset` argument
get_loose_ref_dir(): function renamed from get_loose_refs()
get_loose_ref_cache(): new function
cache_ref_iterator_begin(): make function smarter
commit_packed_refs(): use reference iteration
files_pack_refs(): use reference iteration
do_for_each_entry_in_dir(): delete function
Makefile | 1 +
refs.c | 111 ++++-
refs.h | 2 +-
refs/files-backend.c | 1229 +++++++-------------------------------------------
refs/ref-cache.c | 523 +++++++++++++++++++++
refs/ref-cache.h | 267 +++++++++++
refs/refs-internal.h | 22 +-
7 files changed, 1066 insertions(+), 1089 deletions(-)
create mode 100644 refs/ref-cache.c
create mode 100644 refs/ref-cache.h
--
2.11.0