----- Original Message -----
From: Sharan Basappa
To: Git for human beings
Sent: Saturday, May 21, 2016 4:53 AM
Subject: Re: [git-users] How GIT stores data
Sure. Think of Git as a three layered tool.
The top layer is a polished interface, called "Porcelain", that is
designed to easily manage snapshots and compares and merges of filesystem
The bottom layer, on the other hand, is a filesystem. Files in this
filesystem are read-only. The names of files are fixed based on their content.
So identical files have the same name, and are stored once in the file system.
Building up from fixed files that do not change, are directory objects,
that map human understandable filenames to internal names. And, since this is
itself a filesystem object, if everything in a directory is identical, then the
directory entry is identical, and only stored once.
Based on this, it's pretty easy to see that if two commits are completely
identical, then the only thing that differs is the commit object itself, which
will have a time stamp and user comment.
(The middle layer by the way, are low-level tools designed to work with
the files in this filesystem.)
Dear Michael & Philip,
Thanks. I think I am getting a hang of it.
So, when an existing file is modified then I assume that Git computes its
signature and then checks if such a file already exists.
Is this correct? I ask this because my change can be such that it is same as
one that was previously committed (sort of reverting back a file).
The other thing I understand is that Git always stores every unique instance
of a file as it is and not its differences with a reference file.
One more question I have is on the file system. As such when I clone a
repository, I get full repository and files locally.
So, when I clone a repository, I have full repository and one set of project
files (depending on the branch I have checked out) locally)
Git cheats regarding the initial detection of file modification - it just uses
the file sytem's modified time (mtime), file size, and a couple of other easy
to determine values that are reliable indicators.
Even if it get's it wrong, because Git is snapshot based, it would have taken a
snapshot anyway! Plus because the snapshot is identical, it wouldn't have
actually added anything to the repository (it already had the copy;-). These
file content snapshots are called 'blobs'.
Note that at each stage it is the content, not the metadata that is stored, so
renaming a file doesn't change it's content and nothing new is stored at that
level (it's the same blob). However at the 'directory tree' level (what 'ls',
or 'dir' list) then that content (of the tree's description) has changed, and
it's stored there (these are called 'trees'). So you can have as many copies of
a LICENCE or COPYING file as you like and all that extra content takes no extra
file space (it's one single blob), with only a small amount of space for the
tree, and if that doen't change from commit to commit, then no need for another
Do note that there is no file date information stored in the tree/blob data.
The only place dates are recored is at the point of the commit. Likewise the
only file permission stored is the *nix executable bit.
In answer to the clone question. Yes you get a full copy. You can checkout the
file tree at any point in the project's history (using the various revision
specification methods - more fun), though more frequently it is the tree at the
tip of one of the branches.
There is also a whole load of stuff to discover about 'remote tracking
branches' (which are local branches which track a remote system), and realising
that they are actually local, and just part of your local history tree, and
it's only a naming convention....
You received this message because you are subscribed to the Google Groups "Git
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email
For more options, visit https://groups.google.com/d/optout.