From: "Michael" <keybou...@gmail.com>
To: <git-users@googlegroups.com>
Sent: Friday, May 20, 2016 7:28 PM
Subject: Re: [git-users] How GIT stores data



On 2016-05-20, at 11:10 AM, Sharan Basappa <sharan.basa...@gmail.com> wrote:

Folks,

I am pretty much new to Git though I am using it for a couple of projects (without much understanding as such).

In Git documents, it is mentioned that Git stores data as a stream of snapshots. Compared to other VCS tools, the only difference I am able to tell is that Git stores the entire file for each versions while other VCS tools might store only differences.

Can someone help me understand this?

Sure. Think of Git as a three layered tool.

The top layer is a polished interface, called "Porcelain", that is designed to easily manage snapshots and compares and merges of filesystem trees.

The bottom layer, on the other hand, is a filesystem. Files in this filesystem are read-only. The names of files are fixed based on their content. So identical files have the same name, and are stored once in the file system.

Building up from fixed files that do not change, are directory objects, that map human understandable filenames to internal names. And, since this is itself a filesystem object, if everything in a directory is identical, then the directory entry is identical, and only stored once.

Based on this, it's pretty easy to see that if two commits are completely identical, then the only thing that differs is the commit object itself, which will have a time stamp and user comment.

(The middle layer by the way, are low-level tools designed to work with the files in this filesystem.)

--
Sharan,
In addition to Michael's description, Git does have a method for compression of it's repository, which it uses where possible, called Pack files.

So rather than recording changes (as noted), Git will record complete snapshots, and then compress the full history of all revisions in one go (see some of Linus's laws).

The compressed repository (with all its history) can be smaller than the checked out work tree, so it is efficient to hold the whole snapshot. There is also a whole load of sha1 hash keys that pervade and validate the history, which is good as you always know that if your hash key has the same value as their hash key then they are seeing exactly the same history and content as you, no matter how far away and unknown they are to you. (and if the key's differ, all bets are off!)

Philip
--
You received this message because you are subscribed to the Google Groups "Git for 
human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to