----- Original Message ----- 
  From: Sharan Basappa 
  To: Git for human beings 
  Sent: Saturday, May 21, 2016 4:53 AM
  Subject: Re: [git-users] How GIT stores data

      Sure. Think of Git as a three layered tool. 

      The top layer is a polished interface, called "Porcelain", that is 
designed to easily manage snapshots and compares and merges of filesystem 

      The bottom layer, on the other hand, is a filesystem. Files in this 
filesystem are read-only. The names of files are fixed based on their content. 
So identical files have the same name, and are stored once in the file system. 

      Building up from fixed files that do not change, are directory objects, 
that map human understandable filenames to internal names. And, since this is 
itself a filesystem object, if everything in a directory is identical, then the 
directory entry is identical, and only stored once. 

      Based on this, it's pretty easy to see that if two commits are completely 
identical, then the only thing that differs is the commit object itself, which 
will have a time stamp and user comment. 

      (The middle layer by the way, are low-level tools designed to work with 
the files in this filesystem.) 

  Dear Michael & Philip,

  Thanks. I think I am getting a hang of it.

  So, when an existing file is modified then I assume that Git computes its 
signature and then checks if such a file already exists.
  Is this correct? I ask this because my change can be such that it is same as 
one that was previously committed (sort of reverting back a file).

  The other thing I understand is that Git always stores every unique instance 
of a file as it is and not its differences with a reference file.

  One more question I have is on the file system. As such when I clone a 
repository, I get full repository and files locally.
  So, when I clone a repository, I have full repository and one set of project 
files (depending on the branch I have checked out) locally)


Git cheats regarding the initial detection of file modification - it just uses 
the file sytem's modified time (mtime), file size, and a couple of other easy 
to determine values that are reliable indicators. 

Even if it get's it wrong, because Git is snapshot based, it would have taken a 
snapshot anyway! Plus because the snapshot is identical, it wouldn't have 
actually added anything to the repository (it already had the copy;-). These 
file content snapshots are called 'blobs'.

Note that at each stage it is the content, not the metadata that is stored, so 
renaming a file doesn't change it's content and nothing new is stored at that 
level (it's the same blob). However at the 'directory tree' level (what 'ls', 
or 'dir' list) then that content (of the tree's description) has changed, and 
it's stored there (these are called 'trees'). So you can have as many copies of 
a LICENCE or COPYING file as you like and all that extra content takes no extra 
file space (it's one single blob), with only a small amount of space for the 
tree, and if that doen't change from commit to commit, then no need for another 

Do note that there is no file date information stored in the tree/blob data. 
The only place dates are recored is at the point of the commit. Likewise the 
only file permission stored is the *nix executable bit.

In answer to the clone question. Yes you get a full copy. You can checkout the 
file tree at any point in the project's history (using the various revision 
specification methods - more fun), though more frequently it is the tree at the 
tip of one of the branches.

There is also a whole load of stuff to discover about 'remote tracking 
branches' (which are local branches which track a remote system), and realising 
that they are actually local, and just part of your local history tree, and 
it's only a naming convention....


You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to