On 6/28/2018 1:11 AM, Grant Welch wrote:
I recently read the "Supercharging the Git Commit Graph blog by
Derrick Stolee. I found the article interesting and wanted to verify
the performance numbers for myself. Then that led me to want to know
more about the implementation, so I read the technical design notes in
commit-graph.txt, and then I jumped into the format documentation in
commit-graph-format.txt.

Along the way, I noticed a few issues. They might just be errors in
the documentation, but I figured it was worth documenting my entire
process just to be sure.

"Supercharging the Git Commit Graph", by Derrick Stolee:
   
https://blogs.msdn.microsoft.com/devops/2018/06/25/supercharging-the-git-commit-graph/

# TL;DR

I found a few discrepencies between the documentation in
commit-graph-format.txt and the results that I observed for myself.

1. The "Commit Data" chunk signature is documented to be 'CGET', but
it should be 'CDAT'.

commit-graph.c:18
   #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */

Thanks for catching this! Thankfully, this is an easy fix, as we only need to update the documentation.

2. The "no parent" value is documented to be 0xffffffff, but is
actually 0x70000000.

commit-graph.c:34
   #define GRAPH_PARENT_NONE 0x70000000

This is a more important mistake, as it affects the data that was written in the commit-graph file.

Part of the problem is that leading hex digit of 0x7 which in binary is 0b0111. We already designed a limit of at most 2^{31}-1 (~2.1 billion) commits in the commit-graph because of the way we track octopus edges, but this mistake has cost us more: we cannot store more than ~1.8 billion commits.

I'm sorry for this mixup, mostly because it is aesthetically unpleasant. Those extra 300 million commits mean less to me than having a clean file format.

3. The "generation" field isn't set on any of the commits. (I don't
know what to make of this.)

This is a difference between 2.18 and current 'master', which merged ds/generation-numbers. Commit-graphs written with Git 2.18 have all generation numbers listed as GENERATION_NUMBER_ZERO (0), which lets future versions know that the generation number is not computed yet, so the next commit-graph write will compute the correct generation number.

I'll send a patch soon fixing these doc issues.

Thanks,
-Stolee

Reply via email to