On Fri, 6 Sep 2013, Nguyễn Thái Ngọc Duy wrote:
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <[email protected]>
> ---
> Should be up to date with Nico's latest implementation and also cover
> additions to the format that everybody seems to agree on:
>
> - new types for canonical trees and commits
> - sha-1 table covering missing objects in thin packs
Great! I've merged this into my branch with the following amendment:
diff --git a/Documentation/technical/pack-format.txt
b/Documentation/technical/pack-format.txt
index 1980794..d0c2cde 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -81,6 +81,13 @@ Git pack format
completing thin packs or preserving somewhat ill-formatted
objects.
+ Thin packs are used for transferring on the wire and may omit delta
+ base objects, expecting the receiver to add them at the end of the pack
+ before writing to disk. The number of objects contained in the pack
+ header must account for those omitted objects in any case. To indicate
+ no more objects are included in a thin pack, a "type 0" byte
+ indicator is inserted after the last transmitted object.
+
- The trailer records 20-byte SHA-1 checksum of all of the above.
=== Pack v4 tables
@@ -88,10 +95,7 @@ Git pack format
- A table of sorted SHA-1 object names for all objects contained in
the on-disk pack.
- Thin packs are used for transferring on the wire and may omit base
- objects, expecting the receiver to add them before writing to
- disk. The SHA-1 table in thin packs must include the omitted objects
- as well.
+ The SHA-1 table in thin packs must include the omitted objects as well.
This table can be referred to using "SHA-1 reference encoding": the
index, in variable length encoding, to this table.
@@ -158,7 +162,7 @@ Git pack format
entry (LSB not set), or an instruction to copy tree entries from
another tree (LSB set).
- For copying from another tree, is the LSB of the second number is
+ For copying from another tree, if the LSB of the second number is
set, it will be followed by a base tree SHA-1. If it's not set,
the last base tree will be used.
> diff --git a/Documentation/technical/pack-format.txt
> b/Documentation/technical/pack-format.txt
> index 8e5bf60..c5327ff 100644
> --- a/Documentation/technical/pack-format.txt
> +++ b/Documentation/technical/pack-format.txt
> @@ -1,7 +1,7 @@
> Git pack format
> ===============
>
> -== pack-*.pack files have the following format:
> +== pack-*.pack files version 2 and 3 have the following format:
>
> - A header appears at the beginning and consists of the following:
>
> @@ -36,6 +36,132 @@ Git pack format
>
> - The trailer records 20-byte SHA-1 checksum of all of the above.
>
> +== pack-*.pack files version 4 have the following format:
> +
> + - A header appears at the beginning and consists of the following:
> +
> + 4-byte signature:
> + The signature is: {'P', 'A', 'C', 'K'}
> +
> + 4-byte version number (network byte order): must be 4
> +
> + 4-byte number of objects contained in the pack (network byte order)
> +
> + - A series of tables, described separately.
> +
> + - The tables are followed by number of object entries, each of
> + which looks like below:
> +
> + (undeltified representation)
> + n-byte type and length (4-bit type, (n-1)*7+4-bit length)
> + data
> +
> + (deltified representation)
> + n-byte type and length (4-bit type, (n-1)*7+4-bit length)
> + base object name in SHA-1 reference encoding
> + compressed delta data
> +
> + "type" is used to determine object type. Commit has type 1, tree
> + 2, blob 3, tag 4, ref-delta 7, canonical-commit 9 (commit type
> + with bit 3 set), canonical-tree 10 (tree type with bit 3 set).
> + Compared to v2, ofs-delta type is not used, and canonical-commit
> + and canonical-tree are new types.
> +
> + In undeltified format, blobs and tags ares compressed. Trees are
> + not compressed at all. Some headers in commits are stored
> + uncompressed, the rest is compressed. Tree and commit
> + representations are described in detail separately.
> +
> + Blobs and tags are deltified and compressed the same way in
> + v3. Commits are not delitifed. Trees are deltified using
> + undeltified representation.
> +
> + Trees and commits in canonical types are in the same format as
> + v2: in canonical format and deflated. They can be used for
> + completing thin packs or preserving somewhat ill-formatted
> + objects.
> +
> + - The trailer records 20-byte SHA-1 checksum of all of the above.
> +
> +=== Pack v4 tables
> +
> + - A table of sorted SHA-1 object names for all objects contained in
> + the on-disk pack.
> +
> + Thin packs are used for transferring on the wire and may omit base
> + objects, expecting the receiver to add them before writing to
> + disk. The SHA-1 table in thin packs must include the omitted objects
> + as well.
> +
> + This table can be referred to using "SHA-1 reference encoding": the
> + index, in variable length encoding, to this table.
> +
> + - Ident table: the uncompressed length in variable encoding,
> + followed by zlib-compressed dictionary. Each entry consists of
> + two prefix bytes storing timezone followed by a NUL-terminated
> + string.
> +
> + Entries should be sorted by frequency so that the most frequent
> + entry has the smallest index, thus most efficient variable
> + encoding.
> +
> + The table can be referred to using "ident reference encoding": the
> + index number, in variable length encoding, to this table.
> +
> + - Tree path table: the same format to ident table. Each entry
> + consists of two prefix bytes storing tree entry mode, then a
> + NUL-terminated path name. Same sort order recommendation applies.
> +
> +=== Commit representation
> +
> + - n-byte type and length (4-bit type, (n-1)*7+4-bit length)
> +
> + - Tree SHA-1 in SHA-1 reference encoding
> +
> + - Parent count in variable length encoding
> +
> + - Parent SHA-1s in SHA-1 reference encoding
> +
> + - Author reference in ident reference encoding
> +
> + - Author timestamp in variable length encoding
> +
> + - Committer reference in ident reference encoding
> +
> + - Committer timestamp, encoded as a difference against author
> + timestamp with the LSB used to indicate negative difference.
> +
> + - Compressed data of remaining header and the body
> +
> +=== Tree representation
> +
> + - n-byte type and length (4-bit type, (n-1)*7+4-bit length)
> +
> + - Number of tree entries in variable length encoding
> +
> + - A number of entries, each can be in either forms
> +
> + - INT(path_index << 1) INT(sha1_index)
> +
> + - INT((entry_start << 1) | 1) INT(entry_count << 1)
> +
> + - INT((entry_start << 1) | 1) INT((entry_count << 1) | 1)
> INT(base_sha1_index)
> +
> + INT() denotes a number in variable length encoding. path_index is
> + the index to the tree path table. sha1_index is the index to the
> + SHA-1 table. entry_start is the first tree entry to copy
> + from. entry_count is the number of tree entries to
> + copy. base_sha1_index is the index to SHA-1 table of the base tree
> + to copy from.
> +
> + The LSB of the first number indicates whether it's a plain tree
> + entry (LSB not set), or an instruction to copy tree entries from
> + another tree (LSB set).
> +
> + For copying from another tree, is the LSB of the second number is
> + set, it will be followed by a base tree SHA-1. If it's not set,
> + the last base tree will be used.
> +
> == Original (version 1) pack-*.idx files have the following format:
>
> - The header consists of 256 4-byte network byte order
> @@ -160,3 +286,8 @@ Pack file entry: <+
> corresponding packfile.
>
> 20-byte SHA-1-checksum of all of the above.
> +
> +== Version 3 pack-*.idx files support only *.pack files version 4. The
> + format is the same as version 2 except that the table of sorted
> + 20-byte SHA-1 object names is missing in the .idx files. The same
> + table exists in .pack files and will be used instead.
> --
> 1.8.2.83.gc99314b
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>