Re: [git-users] SHA-1 checksum

Philip Oakley Mon, 08 Aug 2016 05:49:22 -0700

----- Original Message -----

From: Sharan Basappa
>Philip Oakley wrote:
> You have it in one.

> Yes that is the reason that git computes the sha1 of the file's> contents - it provides integrity, veracity and non-repudiation (the last> one is still true though cryo-analysis is getting better, so sha1 is no> longer recommended, and Git is looking at how to progress to newer> crypto-hashes).> Once Git has the sha1's of the files in a directory, it does the same> again for the 'file' that lists the file names, mode bits and their> content's sha1s, and ever onwards up the trees to the commit, which> lists the sha1s of its parents.

> So it you have the sha1 of the tip of a branch, such as master, and you> have a repo that holds that sha1, then you have the full crypto> integrity that your copy (with all its history) is identical to that of> the originators - your own Dali, Rembrant, Gogin, hanging in your> hall... and it isn't even a replica, it's the real thing!

Dear Philip, Michael,

Thanks. It's true that checksums like SHA give a very signature of anyfile. But where things start getting confusing (to me) is when I read "Infact, Git stores everything in its database not by file name but by thehash value of its contents.".

Correct, in the .git/objects folder you will see those new objects stored asab/cdef01234 etc.

This is from book Pro-Git.

So, if Git stores files using just their checksums then

a) how does it look up (or retrieve) a specific file in the database?
For example, if it wants to find a file in the data base then it takeschecksum and starts computing checking of every file in its database &compare?

You will see in my reply that there is a 'next level' file which has thelists of names to associate with the sha1 hash it needs. These are the onescalled 'tree' objects.

This looks pretty costly & rather unnecessary to me.

You will be looking at this from the wrong side. It's about speed ofreconstruction when you are getting a specific revision back from the store.Don't forget that Git normally works on the revision of the completeproject, not just some little file.

b) how does it get keep track file names that are required when it givesus a working copy?

Starting at the commit sha1, it looks for that sha1 file, which is lists thetop level tree sha1. Expand that as the top level directory names, withsha1s for each next level directory of file. It's almost identical to how afile system works! (I think Linus, who wrote git, wrote a little OS, nothingbig, once ;-)

Once you have all that nicely fixed in your head, you can then look (if youare interested in the next layer of digging) at pack files which are Git'sway of compressing all those sha1 files which have lots of repetitionbecause nothing much changes from one rev to the next (or at least itshould, because the changes within a commit should be small! - it's part ofwhat makes Git work)

Thanks again ...

--

No problems

--
You received this message because you are subscribed to the Google Groups "Git for 
human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [git-users] SHA-1 checksum

Reply via email to