Re: [git-users] How GIT stores data

Philip Oakley Fri, 20 May 2016 12:16:24 -0700

From: "Michael" <keybou...@gmail.com>
To: <git-users@googlegroups.com>
Sent: Friday, May 20, 2016 7:28 PM
Subject: Re: [git-users] How GIT stores data




On 2016-05-20, at 11:10 AM, Sharan Basappa <sharan.basa...@gmail.com> wrote:

Folks,
I am pretty much new to Git though I am using it for a couple of projects(without much understanding as such).
In Git documents, it is mentioned that Git stores data as a stream ofsnapshots. Compared to other VCS tools, the only difference I am able totell is that Git stores the entire file for each versions while other VCStools might store only differences.

Can someone help me understand this?


Sure. Think of Git as a three layered tool.

The top layer is a polished interface, called "Porcelain", that is designedto easily manage snapshots and compares and merges of filesystem trees.

The bottom layer, on the other hand, is a filesystem. Files in thisfilesystem are read-only. The names of files are fixed based on theircontent. So identical files have the same name, and are stored once in thefile system.

Building up from fixed files that do not change, are directory objects, thatmap human understandable filenames to internal names. And, since this isitself a filesystem object, if everything in a directory is identical, thenthe directory entry is identical, and only stored once.

Based on this, it's pretty easy to see that if two commits are completelyidentical, then the only thing that differs is the commit object itself,which will have a time stamp and user comment.

(The middle layer by the way, are low-level tools designed to work with thefiles in this filesystem.)


--
Sharan,

In addition to Michael's description, Git does have a method for compressionof it's repository, which it uses where possible, called Pack files.

So rather than recording changes (as noted), Git will record completesnapshots, and then compress the full history of all revisions in one go(see some of Linus's laws).

The compressed repository (with all its history) can be smaller than thechecked out work tree, so it is efficient to hold the whole snapshot. Thereis also a whole load of sha1 hash keys that pervade and validate thehistory, which is good as you always know that if your hash key has the samevalue as their hash key then they are seeing exactly the same history andcontent as you, no matter how far away and unknown they are to you. (and ifthe key's differ, all bets are off!)

Philip

--
You received this message because you are subscribed to the Google Groups "Git for 
human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [git-users] How GIT stores data

Reply via email to