From: "Michael" <keybou...@gmail.com>
To: <git-users@googlegroups.com>
Sent: Friday, May 20, 2016 7:28 PM
Subject: Re: [git-users] How GIT stores data
On 2016-05-20, at 11:10 AM, Sharan Basappa <sharan.basa...@gmail.com> wrote:
Folks,
I am pretty much new to Git though I am using it for a couple of projects
(without much understanding as such).
In Git documents, it is mentioned that Git stores data as a stream of
snapshots. Compared to other VCS tools, the only difference I am able to
tell is that Git stores the entire file for each versions while other VCS
tools might store only differences.
Can someone help me understand this?
Sure. Think of Git as a three layered tool.
The top layer is a polished interface, called "Porcelain", that is designed
to easily manage snapshots and compares and merges of filesystem trees.
The bottom layer, on the other hand, is a filesystem. Files in this
filesystem are read-only. The names of files are fixed based on their
content. So identical files have the same name, and are stored once in the
file system.
Building up from fixed files that do not change, are directory objects, that
map human understandable filenames to internal names. And, since this is
itself a filesystem object, if everything in a directory is identical, then
the directory entry is identical, and only stored once.
Based on this, it's pretty easy to see that if two commits are completely
identical, then the only thing that differs is the commit object itself,
which will have a time stamp and user comment.
(The middle layer by the way, are low-level tools designed to work with the
files in this filesystem.)
--
Sharan,
In addition to Michael's description, Git does have a method for compression
of it's repository, which it uses where possible, called Pack files.
So rather than recording changes (as noted), Git will record complete
snapshots, and then compress the full history of all revisions in one go
(see some of Linus's laws).
The compressed repository (with all its history) can be smaller than the
checked out work tree, so it is efficient to hold the whole snapshot. There
is also a whole load of sha1 hash keys that pervade and validate the
history, which is good as you always know that if your hash key has the same
value as their hash key then they are seeing exactly the same history and
content as you, no matter how far away and unknown they are to you. (and if
the key's differ, all bets are off!)
Philip
--
You received this message because you are subscribed to the Google Groups "Git for
human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.