Re: reftable [v3]: new ref storage format

2017-07-28 Thread Shawn Pearce
On Fri, Jul 28, 2017 at 7:18 PM, Michael Haggerty  wrote:
> On Fri, Jul 28, 2017 at 3:12 PM, Shawn Pearce  wrote:
>> I'm with you this far, and like the {min,max}_update_index in the
>> header. I'm concerned about update_index in 32 bits. At some point you
>> need to reset the counter, or the repository is broken. 4b updates is
>> enough for anyone? I'd feel better about this being a 64 bit field.
>
> Yes, I was a little bit nervous about 32 bits, too. But that's a *lot*
> of updates: one per second for 136 years. If that limit were ever
> reached, there could be a compaction step, where any update indices
> that don't have associated reflog entries are "compacted out" of the
> numerical sequence and the remaining indices are renumbered
> contiguously.

I considered that, but its a bit of a pain for the writer to renumber
the remaining records.

> But it's ok with me to make it 64 bits. Usually those extra bytes
> would be appear as  and so should prefix- and zlib-compress
> well.

That was my thought. Within a single reference these will prefix
compress right out, and zlib will fix any sins within the log block at
restart points.


Re: reftable [v3]: new ref storage format

2017-07-28 Thread Michael Haggerty
On Fri, Jul 28, 2017 at 3:12 PM, Shawn Pearce  wrote:
> I'm with you this far, and like the {min,max}_update_index in the
> header. I'm concerned about update_index in 32 bits. At some point you
> need to reset the counter, or the repository is broken. 4b updates is
> enough for anyone? I'd feel better about this being a 64 bit field.

Yes, I was a little bit nervous about 32 bits, too. But that's a *lot*
of updates: one per second for 136 years. If that limit were ever
reached, there could be a compaction step, where any update indices
that don't have associated reflog entries are "compacted out" of the
numerical sequence and the remaining indices are renumbered
contiguously.

But it's ok with me to make it 64 bits. Usually those extra bytes
would be appear as  and so should prefix- and zlib-compress
well.

Michael


Re: reftable [v3]: new ref storage format

2017-07-28 Thread Shawn Pearce
On Thu, Jul 27, 2017 at 7:28 AM, Michael Haggerty  wrote:
> On Sat, Jul 22, 2017 at 11:29 AM, Shawn Pearce  wrote:
>> 3rd iteration of the reftable storage format.
>>
>> [...]
>> ### Objectives
>>
>> - Near constant time lookup for any single reference, even when the
>>   repository is cold and not in process or kernel cache.
>> - Near constant time verification a SHA-1 is referred to by at least
>>   one reference (for allow-tip-sha1-in-want).
>> - Efficient lookup of an entire namespace, such as `refs/tags/`.
>> - Support atomic push `O(size_of_update)` operations.
>> - Combine reflog storage with ref storage.
>
> I think that the optimization enabled by obj_records is only
> interesting for server-side repositories that will frequently be
> fetched from, and which additionally have allow-tip-sha1-in-want
> turned on, right? So could we make it explicitly optional?

Correct. Its already optional. Maybe I'm just not clear enough about
that in the document. I agree its not required in the file, and
writers can choose to omit obj_record.


>>  obj record
>>
>> An `obj_record` describes a single object abbreviation, and the blocks
>> containing references using that unique abbreviation:
>>
>> varint( prefix_length )
>> varint( (suffix_length << 3) | cnt_3 )
>> suffix
>> varint( cnt_rest )?
>> varint( block_delta )+
>>
> [...]
>> Each record contains `block_count` number of block identifiers for ref
>> blocks.  The `block_count` is determined by:
>>
>> block_count = cnt_3
>> if (cnt_3 == 0x7) {
>>   block_count += cnt_rest
>> }
>>
>> The `cnt_rest` field is only present when `block_count >= 0x7` and
>> could overflow the `cnt_3` field available in the record start.  This
>> encoding scheme is used as the vast majority of abbreviations are
>> only one reference (or at most a few), and unlikely to exceed 6 blocks.
>
> I agree that this `cnt_3` handling is overly cute. There will never be
> records with `block_count == 0`, will there? Then I propose that
> `cnt_3` be set to zero if `block_count` is greater than 7, in which
> case the full block count is stored as a varint. It's simpler, and I
> think the only `block_count`s for which this scheme costs a byte more
> than your scheme are implausible: 128-134, 16383-16390, etc.

That is a great idea, thanks. I've updated the draft to reflect your
approach of cnt_3 = 0 indicates the actual count follows in a varint.


>> [...]
>> ### Log block format
>
> I'm still not happy about how you store reflogs. Here are my objections:
>
> * You store reflogs by appending a high-resolution time to the
> refname. This is awkward:
>   * There is always some uncertainty about uniqueness, even with a
> high-resolution clock. If you want to eliminate that uncertainty, you
> have to read the last entry in the reflog for the reference *for every
> update* to be sure that your timestamp is not already "in use".
>   * There is a need to invent microsecond values on systems with
> low-resolution clocks, again to ensure uniqueness.
>   * If there is clock skew, then either you lose information about the
> *order* of reference updates (which I would consider unacceptable), or
> you might have to invent fictional times for new updates to retain the
> order relative to previous updates. But inventing fictional times is
> very problematic: suppose somebody's clock is incorrectly set to the
> year 2020 when they make a commit. As far as Git is concerned, that is
> plausible, so the reflog entry gets that timestamp. Then the user
> corrects the clock and makes another commit. We would either be forced
> to rewrite the old reflog entry(ies) or to give a timestamp in 2020 to
> the new update.
> * With reftable storage, multiple-reference updates can be done
> atomically. But the reflogs don't retain the information about which
> references were updated together.
> * Even if all reflogs are retained, it is not possible to deduce the
> global state of all references at a moment in the past.
>
> I propose to solve all of the above problems, plus some others, as follows:
>
> Let's keep an integer `update_index`, which starts at zero and
> increases by one each atomic reference update (i.e., it increases by
> one even if the update touches multiple references).
>
> Let's store reflog entries under keys `(refname, update_index)`, and
> store the timestamp of the entry *within* the record rather than as
> part of the key. The tuple `(refname, update_index)` has to be encoded
> somehow that ensures the desired order; see below. This encoding
> ensures that the order of reference updates is well-defined regardless
> of clock skew, and that one can use the reflog to determine which
> references were updated together and what the complete state of
> references was after any update (assuming that the user retains all
> reflogs).
>
> Let's store in the header of each reftable file the `min_update_index`
> and 

Re: reftable [v3]: new ref storage format

2017-07-27 Thread Michael Haggerty
Thanks for your continued work on this.

I have some comments about v3 (inline below).

On Sat, Jul 22, 2017 at 11:29 AM, Shawn Pearce  wrote:
> 3rd iteration of the reftable storage format.
>
> [...]
> ### Objectives
>
> - Near constant time lookup for any single reference, even when the
>   repository is cold and not in process or kernel cache.
> - Near constant time verification a SHA-1 is referred to by at least
>   one reference (for allow-tip-sha1-in-want).
> - Efficient lookup of an entire namespace, such as `refs/tags/`.
> - Support atomic push `O(size_of_update)` operations.
> - Combine reflog storage with ref storage.

I think that the optimization enabled by obj_records is only
interesting for server-side repositories that will frequently be
fetched from, and which additionally have allow-tip-sha1-in-want
turned on, right? So could we make it explicitly optional?

> [...]
>  obj record
>
> An `obj_record` describes a single object abbreviation, and the blocks
> containing references using that unique abbreviation:
>
> varint( prefix_length )
> varint( (suffix_length << 3) | cnt_3 )
> suffix
> varint( cnt_rest )?
> varint( block_delta )+
>
> Like in reference blocks, abbreviations are prefix compressed within
> an obj block.  On large reftable files with many unique objects,
> higher block sizes (64k), and higher restart interval (128), a
> `prefix_length` of 2 or 3 and `suffix_length` of 3 may be common in
> obj records (unique abbreviation of 5-6 raw bytes, 10-12 hex digits).
>
> Each record contains `block_count` number of block identifiers for ref
> blocks.  The `block_count` is determined by:
>
> block_count = cnt_3
> if (cnt_3 == 0x7) {
>   block_count += cnt_rest
> }
>
> The `cnt_rest` field is only present when `block_count >= 0x7` and
> could overflow the `cnt_3` field available in the record start.  This
> encoding scheme is used as the vast majority of abbreviations are
> only one reference (or at most a few), and unlikely to exceed 6 blocks.

I agree that this `cnt_3` handling is overly cute. There will never be
records with `block_count == 0`, will there? Then I propose that
`cnt_3` be set to zero if `block_count` is greater than 7, in which
case the full block count is stored as a varint. It's simpler, and I
think the only `block_count`s for which this scheme costs a byte more
than your scheme are implausible: 128-134, 16383-16390, etc.

> [...]
> ### Log block format

I'm still not happy about how you store reflogs. Here are my objections:

* You store reflogs by appending a high-resolution time to the
refname. This is awkward:
  * There is always some uncertainty about uniqueness, even with a
high-resolution clock. If you want to eliminate that uncertainty, you
have to read the last entry in the reflog for the reference *for every
update* to be sure that your timestamp is not already "in use".
  * There is a need to invent microsecond values on systems with
low-resolution clocks, again to ensure uniqueness.
  * If there is clock skew, then either you lose information about the
*order* of reference updates (which I would consider unacceptable), or
you might have to invent fictional times for new updates to retain the
order relative to previous updates. But inventing fictional times is
very problematic: suppose somebody's clock is incorrectly set to the
year 2020 when they make a commit. As far as Git is concerned, that is
plausible, so the reflog entry gets that timestamp. Then the user
corrects the clock and makes another commit. We would either be forced
to rewrite the old reflog entry(ies) or to give a timestamp in 2020 to
the new update.
* With reftable storage, multiple-reference updates can be done
atomically. But the reflogs don't retain the information about which
references were updated together.
* Even if all reflogs are retained, it is not possible to deduce the
global state of all references at a moment in the past.

I propose to solve all of the above problems, plus some others, as follows:

Let's keep an integer `update_index`, which starts at zero and
increases by one each atomic reference update (i.e., it increases by
one even if the update touches multiple references).

Let's store reflog entries under keys `(refname, update_index)`, and
store the timestamp of the entry *within* the record rather than as
part of the key. The tuple `(refname, update_index)` has to be encoded
somehow that ensures the desired order; see below. This encoding
ensures that the order of reference updates is well-defined regardless
of clock skew, and that one can use the reflog to determine which
references were updated together and what the complete state of
references was after any update (assuming that the user retains all
reflogs).

Let's store in the header of each reftable file the `min_update_index`
and `max_update_index` that are covered by the file. Do this such that
the indexes chain together properly for the reftables 

Re: reftable [v3]: new ref storage format

2017-07-27 Thread Stefan Beller
On Mon, Jul 24, 2017 at 4:00 PM, Shawn Pearce  wrote:
> On Mon, Jul 24, 2017 at 3:22 PM, Stefan Beller  wrote:

>>>  index record
>>>
>>> An index record describes the last entry in another block.
>>> Index records are written as:
>>>
>>> varint( prefix_length )
>>> varint( (suffix_length << 3) | 0 )
>>> suffix
>>> varint( block_offset )
>>>
>>> Index records use prefix compression exactly like `ref_record`.
>>>
>>> Index records store `block_offset` after the suffix, specifying the
>>> offset in bytes (from the start of the file) of the block that ends
>>> with this reference.
>>
>> Instead of hardcoding the "0" in the last 3 bits, maybe pick one
>> of the reserved bit patterns to be there? I would imagine this
>> makes debugging easier:
>>
>> 0x5? Hah that must be an index block I have been
>> looking at the wrong block!
>
> This is an excellent suggestion. I'll include it in the next iteration.

I was thinking about this a bit more and wondering if this would
allow mixing all sorts of entries in the same block. Think about one
of the most common cases client side:

git commit -m "just a lonely ref update, no fancy stuff"

For that we need (a) a single ref update and (b) now that the
format evolved a single reflog entry, maybe (c) an obj entry, too.

I just realize that in this case we'd crank down the block size
such that there is very little padding required, but I still wonder
if it would be possible to omit the second (and third) block,
but rather go for a "m"ixed block containing just these 3 entries.

Additionally by setting the block_size to a special value "0",
we could indicate the omission of a footer, such that a single
ref update reftable is super small.


Re: reftable [v3]: new ref storage format

2017-07-24 Thread Shawn Pearce
On Mon, Jul 24, 2017 at 3:22 PM, Stefan Beller  wrote:
> On Sat, Jul 22, 2017 at 11:29 AM, Shawn Pearce  wrote:
>> 3rd iteration of the reftable storage format.
>>
>> You can read a rendered version of this here:
>> https://googlers.googlesource.com/sop/jgit/+/reftable/Documentation/technical/reftable.md
>>
>> Significant changes from v2:
>> - efficient lookup by SHA-1 for allow-tip-sha1-in-want.
>
> I'll focus on that in the review, it sounds exciting.


>> ### Ref block format
> ...
>
>> A variable number of 4-byte `restart_offset` values follow the
>> records.  Offsets are relative to the start of the block and refer to
>> the first byte of any `ref_record` whose name has not been prefix
>> compressed.  Readers can start linear scans from any of these records.
>> Offsets in the first block are relative to the start of the file
>> (position 0), and include the file header.  This requires the first
>> restart in the first block to be at offset 8.
>>
>> The 2-byte `restart_count_m1` stores *one less* than the number of
>> entries in the `restart_offset` list.  There is always a restart
>> corresponding to the first ref record. Readers are responsible for
>> computing `restart_count = restart_count_m1 + 1`.
>
> I had to reread these two paragraphs a couple of times as it calls out
> the uninteresting things[1] and the interesting part is the logical
> conclusion that one has to make themselves,
> here is how I would write it:
>
> Readers can start linear scans from any record whose name has
> not been prefix compressed. The first record of a block must not
> be prefix-compressed.
>
> To aid finding the entry points for linear scans, a variable number
> of 4-byte `restart_offset` values follow the records. Offsets are
> relative to the start of the block and refer to the first byte of any
> such `ref_record` that is not prefix compressed.
> The first record can be omitted in the `restart_offset` values as it
> is implicit.

No, the first record must be listed in the restart_offset table. Its
not implicit. The count of it is implicit, to allow up to 65536
restart_offset entries using only a uint16 restart_count. Maybe that
is being too cute, and the count should just be the entry count.

> The `restart_offset' values must be sorted (ascending,
> descending?).

Sorted, ascending.


>> The `value` follows.  Its format is determined by `value_type`, one of
>> the following:
>>
>> - `0x0`: deletion; no value data (see transactions, below)
>> - `0x1`: one 20-byte object id; value of the ref
>> - `0x2`: two 20-byte object ids; value of the ref, peeled target
>
> Up to here it is easy to spot a pattern:
> The number indicates how many oids follow.
>
>> - `0x3`: symbolic reference: `varint( target_len ) target`
>> - `0x4`: length delimited extension: `varint( data_len ) data`
>
> This breaks the pattern.
>
> Instead of hardcoding the numbers here, I wonder if we rather
> want to make the bits more meaningful:
>
>   bit 0, 1: number of oids iff bit 2 unset
>
>   Iff bit 2 set, then we have a "varint (len) data"
>   that follows, bits 0,1 are used for a different purpose,
>   00 indicates 'symlink' and data is the string
>   01 indicates 'multihead' such as FETCH_HEAD
>   1* is reserved for now.
>
> This *may* be neat micro optimization, but hardcoding all bits
> to a lookup table is fine, too.

The problem I have with this bit-based rule is it breaks down later as
you use additional codes, or you find yourself limited by the bit
scheme and don't have the full range of 8 values available. So I
decided to be very literal about what the codes mean, and use a lookup
table.


>   Note that symrefs and multiheads could share the same
>   type, iff we had dissallowed '\n' in refnames. (we do?
>   otherwise FETCH_HEAD would be broken)
>   The differentiator would be the '\n' or '\0' at the end of the
>   first target.

True, \n is not allowed in a ref name, and neither is \0.

Symrefs in loose ref format use "ref: \n" as their content. We
could use that format in reftable, and then HEAD and FETCH_HEAD could
use the same value code, 0x3.


>>  index record
>>
>> An index record describes the last entry in another block.
>> Index records are written as:
>>
>> varint( prefix_length )
>> varint( (suffix_length << 3) | 0 )
>> suffix
>> varint( block_offset )
>>
>> Index records use prefix compression exactly like `ref_record`.
>>
>> Index records store `block_offset` after the suffix, specifying the
>> offset in bytes (from the start of the file) of the block that ends
>> with this reference.
>
> Instead of hardcoding the "0" in the last 3 bits, maybe pick one
> of the reserved bit patterns to be there? I would imagine this
> makes debugging easier:
>
> 0x5? Hah that must be an index block I have been
> looking at the wrong block!

This is an excellent suggestion. I'll include it in the next iteration.


>> ### Obj 

Re: reftable [v3]: new ref storage format

2017-07-24 Thread Stefan Beller
On Sat, Jul 22, 2017 at 11:29 AM, Shawn Pearce  wrote:
> 3rd iteration of the reftable storage format.
>
> You can read a rendered version of this here:
> https://googlers.googlesource.com/sop/jgit/+/reftable/Documentation/technical/reftable.md
>
> Significant changes from v2:
> - efficient lookup by SHA-1 for allow-tip-sha1-in-want.

I'll focus on that in the review, it sounds exciting.

> - type 0x4 for FETCH_HEAD, MERGE_HEAD.

So we'd have
varint( (suffix_length << 3) | value_type )
which leaves us with 5 bits for the varint. One bit is needed
to indicate if ew have more bytes coming in the varint, so
only 4 bits in the first byte to encode the suffix length.
That means we can have suffixes up to 15 and we'd still be fine
with just one byte IIUC. That should be enough, specifically for
contiguous numbers as ref names.

> ### Block size
>
> The `block_size` is arbitrarily determined by the writer, and does not
> have to be a power of 2.  The block size must be larger than the
> longest reference name or deflated log entry used in the repository,
> as references cannot span blocks.
>
> Powers of two that are friendly to the virtual memory system or
> filesystem (such as 4k or 8k) are recommended.  Larger sizes (64k) can
> yield better compression, with a possible increased cost incurred by
> readers during access.
>
> The largest block size is `16777215` bytes (15.99 MiB).

Thanks for calling this out, 16MB ought to be enough for everyone.
(Not a joke, as we can have multiple blocks)

> ### Ref block format
...

> A variable number of 4-byte `restart_offset` values follow the
> records.  Offsets are relative to the start of the block and refer to
> the first byte of any `ref_record` whose name has not been prefix
> compressed.  Readers can start linear scans from any of these records.
> Offsets in the first block are relative to the start of the file
> (position 0), and include the file header.  This requires the first
> restart in the first block to be at offset 8.
>
> The 2-byte `restart_count_m1` stores *one less* than the number of
> entries in the `restart_offset` list.  There is always a restart
> corresponding to the first ref record. Readers are responsible for
> computing `restart_count = restart_count_m1 + 1`.

I had to reread these two paragraphs a couple of times as it calls out
the uninteresting things[1] and the interesting part is the logical
conclusion that one has to make themselves,
here is how I would write it:

Readers can start linear scans from any record whose name has
not been prefix compressed. The first record of a block must not
be prefix-compressed.

To aid finding the entry points for linear scans, a variable number
of 4-byte `restart_offset` values follow the records. Offsets are
relative to the start of the block and refer to the first byte of any
such `ref_record` that is not prefix compressed.
The first record can be omitted in the `restart_offset` values as it
is implicit.

The `restart_offset' values must be sorted (ascending,
descending?).

The 2-byte `restart_count_m1` stores the number of optional
entry points, i.e. the number of values in `restart_offset'.
As the first record is omitted in the offset table, there is at
least one entry at 0. the `restart_count_m1` can be zero for
best compression.

[1] The last paragraph in my proposal sounds less like an
off-by-one error to me, but just states what the number is and how
it relates to the surroundings in the file format.

...

> The `value` follows.  Its format is determined by `value_type`, one of
> the following:
>
> - `0x0`: deletion; no value data (see transactions, below)
> - `0x1`: one 20-byte object id; value of the ref
> - `0x2`: two 20-byte object ids; value of the ref, peeled target

Up to here it is easy to spot a pattern:
The number indicates how many oids follow.

> - `0x3`: symbolic reference: `varint( target_len ) target`
> - `0x4`: length delimited extension: `varint( data_len ) data`

This breaks the pattern.

Instead of hardcoding the numbers here, I wonder if we rather
want to make the bits more meaningful:

  bit 0, 1: number of oids iff bit 2 unset

  Iff bit 2 set, then we have a "varint (len) data"
  that follows, bits 0,1 are used for a different purpose,
  00 indicates 'symlink' and data is the string
  01 indicates 'multihead' such as FETCH_HEAD
  1* is reserved for now.

This *may* be neat micro optimization, but hardcoding all bits
to a lookup table is fine, too.

  Note that symrefs and multiheads could share the same
  type, iff we had dissallowed '\n' in refnames. (we do?
  otherwise FETCH_HEAD would be broken)
  The differentiator would be the '\n' or '\0' at the end of the
  first target.

> Types `0x5..0x7` are reserved.

The proposal above would make 0x3, 0x6, 0x7 be reserved.

> ### Ref index
...
>
> An index block should only be written if there are at least 4 blocks
> in the file, as cold 

Re: reftable [v3]: new ref storage format

2017-07-23 Thread Ævar Arnfjörð Bjarmason
On Sat, Jul 22, 2017 at 8:29 PM, Shawn Pearce  wrote:
> 3rd iteration of the reftable storage format.
>
> You can read a rendered version of this here:
> https://googlers.googlesource.com/sop/jgit/+/reftable/Documentation/technical/reftable.md
>
> Significant changes from v2:
> - efficient lookup by SHA-1 for allow-tip-sha1-in-want.
> - type 0x4 for FETCH_HEAD, MERGE_HEAD.
> - file size up (27.7 M in v1, 34.4 M in v3)

I had some feedback on v2 here which still applies:
https://public-inbox.org/git/87k234tti7@gmail.com/

It would be good to either get a reply to that, or if you don't think
it's sensible for whatever reason and left it out of v3 have a
"feedback received but discarded for " in these summaries as
you're sending new versions.

Aside from the mail I sent I think that would be very useful in
general if there's been any other such feedback (I honestly don't know
if there has, I haven't been following this actively).


Re: reftable [v3]: new ref storage format

2017-07-22 Thread Shawn Pearce
3rd iteration of the reftable storage format.

You can read a rendered version of this here:
https://googlers.googlesource.com/sop/jgit/+/reftable/Documentation/technical/reftable.md

Significant changes from v2:
- efficient lookup by SHA-1 for allow-tip-sha1-in-want.
- type 0x4 for FETCH_HEAD, MERGE_HEAD.
- file size up (27.7 M in v1, 34.4 M in v3)

The file size increase is due to lookup by SHA-1 support. By using
unique abbreviations its adding about 7 MiB to the file size for
865,258 objects behind 866,456 refs. Average entry for this direction
costs 8 bytes, using a 6 byte/12 hex unique abbreviation.


## Overview

### Problem statement

Some repositories contain a lot of references (e.g.  android at 866k,
rails at 31k).  The existing packed-refs format takes up a lot of
space (e.g.  62M), and does not scale with additional references.
Lookup of a single reference requires linearly scanning the file.

Atomic pushes modifying multiple references require copying the
entire packed-refs file, which can be a considerable amount of data
moved (e.g. 62M in, 62M out) for even small transactions (2 refs
modified).

Repositories with many loose references occupy a large number of disk
blocks from the local file system, as each reference is its own file
storing 41 bytes (and another file for the corresponding reflog).
This negatively affects the number of inodes available when a large
number of repositories are stored on the same filesystem.  Readers can
be penalized due to the larger number of syscalls required to traverse
and read the `$GIT_DIR/refs` directory.

### Objectives

- Near constant time lookup for any single reference, even when the
  repository is cold and not in process or kernel cache.
- Near constant time verification a SHA-1 is referred to by at least
  one reference (for allow-tip-sha1-in-want).
- Efficient lookup of an entire namespace, such as `refs/tags/`.
- Support atomic push `O(size_of_update)` operations.
- Combine reflog storage with ref storage.

### Description

A reftable file is a portable binary file format customized for
reference storage. References are sorted, enabling linear scans,
binary search lookup, and range scans.

Storage in the file is organized into blocks.  Prefix compression
is used within a single block to reduce disk space.  Block size is
tunable by the writer.

### Performance

Space used, packed-refs vs. reftable:

repository | packed-refs | reftable | % original | avg ref  | avg obj
---|:|-:|---:|-:|:
android|  62.2 M |   34.4 M | 55.2%  | 33 bytes | 8 bytes
rails  |   1.8 M |1.1 M | 57.7%  | 29 bytes | 6 bytes
git|  78.7 K |   44.0 K | 60.0%  | 50 bytes | 6 bytes
git (heads)|   332 b |239 b | 72.0%  | 31 bytes | 0 bytes

Scan (read 866k refs), by reference name lookup (single ref from 866k
refs), and by SHA-1 lookup (refs with that SHA-1, from 866k refs):

format  | scan| by name| by SHA-1
|:|---:|---:
packed-refs |  402 ms | 409,660.1 usec | 412,535.8 usec
reftable|  112 ms |  42.7 usec | 340.8 usec

Space used for 149,932 log entries for 43,061 refs,
reflog vs. reftable:

format| size  | avg log
--|--:|---:
$GIT_DIR/logs | 173 M | 1209 bytes
reftable  |   4 M |   30 bytes

## Details

### Peeling

References in a reftable are always peeled.

### Reference name encoding

Reference names should be encoded with UTF-8.

### Network byte order

All multi-byte, fixed width fields are in network byte order.

### Ordering

Blocks are lexicographically ordered by their first reference.

### Directory/file conflicts

The reftable format accepts both `refs/heads/foo` and
`refs/heads/foo/bar` as distinct references.

This property is useful for retaining log records in reftable, but may
confuse versions of Git using `$GIT_DIR/refs` directory tree to
maintain references.  Users of reftable may choose to continue to
reject `foo` and `foo/bar` type conflicts to prevent problems for
peers.

## File format

### Structure

A reftable file has the following high-level structure:

first_block {
  header
  first_ref_block
}
ref_blocks*
ref_index?
obj_blocks*
obj_index?
log_blocks*
log_index?
footer

### Block size

The `block_size` is arbitrarily determined by the writer, and does not
have to be a power of 2.  The block size must be larger than the
longest reference name or deflated log entry used in the repository,
as references cannot span blocks.

Powers of two that are friendly to the virtual memory system or
filesystem (such as 4k or 8k) are recommended.  Larger sizes (64k) can
yield better compression, with a possible increased cost incurred by
readers during access.

The largest block size is `16777215` bytes (15.99 MiB).

### Header

An 8-byte header appears at the beginning of the file:

'\1REF'
uint8(