----- Original Message -----
From: Sharan Basappa
>Philip Oakley wrote:
> You have it in one.

> Yes that is the reason that git computes the sha1 of the file's > contents - it provides integrity, veracity and non-repudiation (the last > one is still true though cryo-analysis is getting better, so sha1 is no > longer recommended, and Git is looking at how to progress to newer > crypto-hashes). > Once Git has the sha1's of the files in a directory, it does the same > again for the 'file' that lists the file names, mode bits and their > content's sha1s, and ever onwards up the trees to the commit, which > lists the sha1s of its parents.

> So it you have the sha1 of the tip of a branch, such as master, and you > have a repo that holds that sha1, then you have the full crypto > integrity that your copy (with all its history) is identical to that of > the originators - your own Dali, Rembrant, Gogin, hanging in your > hall... and it isn't even a replica, it's the real thing!

Dear Philip, Michael,

Thanks. It's true that checksums like SHA give a very signature of any file. But where things start getting confusing (to me) is when I read "In fact, Git stores everything in its database not by file name but by the hash value of its contents.".

Correct, in the .git/objects folder you will see those new objects stored as ab/cdef01234 etc.

This is from book Pro-Git.

So, if Git stores files using just their checksums then

a) how does it look up (or retrieve) a specific file in the database?
For example, if it wants to find a file in the data base then it takes checksum and starts computing checking of every file in its database & compare?

You will see in my reply that there is a 'next level' file which has the lists of names to associate with the sha1 hash it needs. These are the ones called 'tree' objects.

This looks pretty costly & rather unnecessary to me.

You will be looking at this from the wrong side. It's about speed of reconstruction when you are getting a specific revision back from the store. Don't forget that Git normally works on the revision of the complete project, not just some little file.

b) how does it get keep track file names that are required when it gives us a working copy?

Starting at the commit sha1, it looks for that sha1 file, which is lists the top level tree sha1. Expand that as the top level directory names, with sha1s for each next level directory of file. It's almost identical to how a file system works! (I think Linus, who wrote git, wrote a little OS, nothing big, once ;-)

Once you have all that nicely fixed in your head, you can then look (if you are interested in the next layer of digging) at pack files which are Git's way of compressing all those sha1 files which have lots of repetition because nothing much changes from one rev to the next (or at least it should, because the changes within a commit should be small! - it's part of what makes Git work)

Thanks again ...
No problems
You received this message because you are subscribed to the Google Groups "Git for 
human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to