Derrick Stolee <[email protected]> writes:
> On 3/30/2018 9:25 AM, Jakub Narebski wrote:
>> Derrick Stolee <[email protected]> writes:
>>
>>> +== graph-*.graph files have the following format:
>> What is this '*' here?
>
> No longer necessary. It used to be a placeholder for a hash value, but
> now the graph is stored in objects/info/commit-graph.
All right.
Excuse me replying to v4 instead of v6 of the patch series, where it
would be answered or rather made moot already.
>>
>> [...]
>>> + The remaining data in the body is described one chunk at a time, and
>>> + these chunks may be given in any order. Chunks are required unless
>>> + otherwise specified.
>> Does Git need to understand all chunks, or could there be optional
>> chunks that can be safely ignored (like in PNG format)? Though this may
>> be overkill, and could be left for later revision of the format if
>> deemed necessary.
>
> In v6, the format and design documents are edited to make clear the
> use of optional chunks, specifically for future extension without
> increasing the version number.
That's good.
>>> +CHUNK DATA:
>>> +
>>> + OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
>>> + The ith entry, F[i], stores the number of OIDs with first
>>> + byte at most i. Thus F[255] stores the total
>>> + number of commits (N).
>> All right, it is small enough that can be required even for a very small
>> number of commits.
>>
>>> +
>>> + OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
>>> + The OIDs for all commits in the graph, sorted in ascending order.
>>> +
>>> + Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)
>> Do commits need to be put here in the ascending order of OIDs?
>
> Yes.
>
>> If so, this would mean that it is not possible to add information about
>> new commits by only appending data and maybe overwriting some fields, I
>> think. You would need to do full rewrite to insert new commit in
>> appropriate place.
>
> That is the idea. This file is not updated with every new commit, but
> instead will be updated on some scheduled cleanup events. The
> commit-graph file is designed in a way to be non-critical, and not
> tied to the packfile layout. This allows flexibility for when to do
> the write.
>
> For example, in GVFS, we will write a new commit-graph when there are
> new daily prefetch packs.
>
> This could also integrate with 'gc' and 'repack' so whenever they are
> triggered the commit-graph is written as well.
I wonder if it would be possible to use existing hooks...
> Commits that do not exist in the commit-graph file will load from the
> object database as normal (after a failed lookup in the commit-graph
> file).
Ah. I thought wrongly that it would (or at least could) be something
that can be kept up to date, and extended when adding any new commit.
>>> + * The first H bytes are for the OID of the root tree.
>>> + * The next 8 bytes are for the int-ids of the first two parents
>>> + of the ith commit. Stores value 0xffffffff if no parent in that
>>> + position. If there are more than two parents, the second value
>>> + has its most-significant bit on and the other bits store an array
>>> + position into the Large Edge List chunk.
>>> + * The next 8 bytes store the generation number of the commit and
>>> + the commit time in seconds since EPOCH. The generation number
>>> + uses the higher 30 bits of the first 4 bytes, while the commit
>>> + time uses the 32 bits of the second 4 bytes, along with the lowest
>>> + 2 bits of the lowest byte, storing the 33rd and 34th bit of the
>>> + commit time.
>>> +
>>> + Large Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional]
>>> + This list of 4-byte values store the second through nth parents for
>>> + all octopus merges. The second parent value in the commit data stores
>>> + an array position within this list along with the most-significant
>>> bit
>>> + on. Starting at that array position, iterate through this list of
>>> int-ids
>>> + for the parents until reaching a value with the most-significant bit
>>> on.
>>> + The other bits correspond to the int-id of the last parent.
>>
>> All right, that is one chunk that cannot use fixed-length records; this
>> shouldn't matter much, as we iterate only up to the number of parents
>> less two.
>
> Less one: the second "parent" column of the commit data chunk is used
> to point into this list, so (P-1) parents are in this chunk for a
> commit with P parents.
Right.
>> A question: what happens to the last list of parents? Is there a
>> guardian value of 0xffffffff at last place?
>
> The termination condition is in the position of the last parent, since
> the most-significant bit is on. The other 31 bits contain the int-id
> of the parent.
Ah. I have misunderstood the format: I thought that first entry is
marked with most-significant bit set to 1, and all the rest to 0, while
it is last entry (last parent) has most-significant bit set, while all
others (if any) do not. So there is no need for guardian value.
Best regards,
--
Jakub Narębski