Re: [git-users] SHA-1 checksum

Dale R. Worley Mon, 08 Aug 2016 08:42:17 -0700

Sharan Basappa <[email protected]> writes:
> So, if Git stores files using just their checksums then
>
> a) how does it look up (or retrieve) a specific file in the database?
> For example, if it wants to find a file in the data base then it takes 
> checksum and starts computing checking of every file in its database & 
> compare?
> This looks pretty costly & rather unnecessary to me.
>
> b) how does it get keep track file names that are required when it gives us 
> a working copy?


Consider one of my Git repositories.  The file .git/HEAD contains

    ref: refs/heads/hobgoblin

That points to the file .git/refs/heads/hobgoblin, which contains the
hash of the commit which is the tip of the "hobgoblin" branch:

    92f8f718eb9b19f921f20283e55c56e8dc66ed10

That point to the file
.git/objects/92/f8f718eb9b19f921f20283e55c56e8dc66ed10.  That file's
contents aren't in ASCII, so you have to use "git cat-file -p
92f8f718eb9b19f921f20283e55c56e8dc66ed10" to read its contents:

    tree d5d1ad293f8fdd4a4a4e0e9a73c5c3c851126c22
    parent 39c83b086e141bb00d32737a4e2aae675d795f44
    author Dale R. Worley <[email protected]> 1470669963 -0400
    committer Dale R. Worley <[email protected]> 1470669963 -0400

    ...

So the hash of the tree object is
d5d1ad293f8fdd4a4a4e0e9a73c5c3c851126c22 and the hash of the one parent
commit is 39c83b086e141bb00d32737a4e2aae675d795f44.  The tree object is
in .git/objects/d5/d1ad293f8fdd4a4a4e0e9a73c5c3c851126c22, but again,
you have to use git-cat-file to read it:

    100644 blob 0215040f90f133f999bac86eede7565c6d09b93d        -NOTES
    100644 blob ef62bfd5a8e81c8ca13372b2436bccf1c0698185        -NOTES.MYOB
    100644 blob 65dda34dadf753dbfc791b5811f3cd437a666cac        
-NOTES.XA.recovery
    100644 blob 88182ec16035fd4d77c0c1312ce1510f2f8da4b2        
-NOTES.XB.recovery
    100644 blob 73415b6e2ebcd6a384874c0ab40ec70a5112db18        -NOTES.freeze
    100644 blob 3a4fb8ec6e7c0219c4d7ab002eaaa84abae2c72d        -NOTES.gleaning
    040000 tree c21923c2647ecec7d627a49e51b4e8b5d19344b4        .a68g
    100644 blob f9a4c46f50234a11f9ad283973ed2f11a4758f2f        .aspell.en.prepl
    100644 blob 182c2739a5cc69a322a41723d4423ed1d8a6266e        .aspell.en.pws
    ...

The contents of file "-NOTES" is in
.git/objects/02/15040f90f133f999bac86eede7565c6d09b93d.  In this case,
that object is in one of the "pack" files.  git-cat-file has to read
through the indexes of the pack files to find that.

The critical ideas are that files are stored by their *contents* not
their *names*.  Any particular blob of content has an eternally unique
name (its hash), which will be the same in any repository containing a
blob with the same bytes.  "tree" objects are used to catalog the names
of files and their contents.

Dale

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [git-users] SHA-1 checksum

Reply via email to