----- Original Message -----
From: Sharan Basappa
>Philip Oakley wrote:
> You have it in one.
> Yes that is the reason that git computes the sha1 of the file's
> contents - it provides integrity, veracity and non-repudiation (the last
> one is still true though cryo-analysis is getting better, so sha1 is no
> longer recommended, and Git is looking at how to progress to newer
> crypto-hashes).
> Once Git has the sha1's of the files in a directory, it does the same
> again for the 'file' that lists the file names, mode bits and their
> content's sha1s, and ever onwards up the trees to the commit, which
> lists the sha1s of its parents.
> So it you have the sha1 of the tip of a branch, such as master, and you
> have a repo that holds that sha1, then you have the full crypto
> integrity that your copy (with all its history) is identical to that of
> the originators - your own Dali, Rembrant, Gogin, hanging in your
> hall... and it isn't even a replica, it's the real thing!
Dear Philip, Michael,
Thanks. It's true that checksums like SHA give a very signature of any
file. But where things start getting confusing (to me) is when I read "In
fact, Git stores everything in its database not by file name but by the
hash value of its contents.".
Correct, in the .git/objects folder you will see those new objects stored as
ab/cdef01234 etc.
This is from book Pro-Git.
So, if Git stores files using just their checksums then
a) how does it look up (or retrieve) a specific file in the database?
For example, if it wants to find a file in the data base then it takes
checksum and starts computing checking of every file in its database &
compare?
You will see in my reply that there is a 'next level' file which has the
lists of names to associate with the sha1 hash it needs. These are the ones
called 'tree' objects.
This looks pretty costly & rather unnecessary to me.
You will be looking at this from the wrong side. It's about speed of
reconstruction when you are getting a specific revision back from the store.
Don't forget that Git normally works on the revision of the complete
project, not just some little file.
b) how does it get keep track file names that are required when it gives
us a working copy?
Starting at the commit sha1, it looks for that sha1 file, which is lists the
top level tree sha1. Expand that as the top level directory names, with
sha1s for each next level directory of file. It's almost identical to how a
file system works! (I think Linus, who wrote git, wrote a little OS, nothing
big, once ;-)
Once you have all that nicely fixed in your head, you can then look (if you
are interested in the next layer of digging) at pack files which are Git's
way of compressing all those sha1 files which have lots of repetition
because nothing much changes from one rev to the next (or at least it
should, because the changes within a commit should be small! - it's part of
what makes Git work)
Thanks again ...
--
No problems
--
You received this message because you are subscribed to the Google Groups "Git for
human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.