----- Original Message ----- From: Sharan Basappa To: Git for human beings Sent: Saturday, May 21, 2016 4:53 AM Subject: Re: [git-users] How GIT stores data
Sure. Think of Git as a three layered tool. The top layer is a polished interface, called "Porcelain", that is designed to easily manage snapshots and compares and merges of filesystem trees. The bottom layer, on the other hand, is a filesystem. Files in this filesystem are read-only. The names of files are fixed based on their content. So identical files have the same name, and are stored once in the file system. Building up from fixed files that do not change, are directory objects, that map human understandable filenames to internal names. And, since this is itself a filesystem object, if everything in a directory is identical, then the directory entry is identical, and only stored once. Based on this, it's pretty easy to see that if two commits are completely identical, then the only thing that differs is the commit object itself, which will have a time stamp and user comment. (The middle layer by the way, are low-level tools designed to work with the files in this filesystem.) Dear Michael & Philip, Thanks. I think I am getting a hang of it. So, when an existing file is modified then I assume that Git computes its signature and then checks if such a file already exists. Is this correct? I ask this because my change can be such that it is same as one that was previously committed (sort of reverting back a file). The other thing I understand is that Git always stores every unique instance of a file as it is and not its differences with a reference file. One more question I have is on the file system. As such when I clone a repository, I get full repository and files locally. So, when I clone a repository, I have full repository and one set of project files (depending on the branch I have checked out) locally) Thanks, -- Git cheats regarding the initial detection of file modification - it just uses the file sytem's modified time (mtime), file size, and a couple of other easy to determine values that are reliable indicators. Even if it get's it wrong, because Git is snapshot based, it would have taken a snapshot anyway! Plus because the snapshot is identical, it wouldn't have actually added anything to the repository (it already had the copy;-). These file content snapshots are called 'blobs'. Note that at each stage it is the content, not the metadata that is stored, so renaming a file doesn't change it's content and nothing new is stored at that level (it's the same blob). However at the 'directory tree' level (what 'ls', or 'dir' list) then that content (of the tree's description) has changed, and it's stored there (these are called 'trees'). So you can have as many copies of a LICENCE or COPYING file as you like and all that extra content takes no extra file space (it's one single blob), with only a small amount of space for the tree, and if that doen't change from commit to commit, then no need for another copy... Do note that there is no file date information stored in the tree/blob data. The only place dates are recored is at the point of the commit. Likewise the only file permission stored is the *nix executable bit. In answer to the clone question. Yes you get a full copy. You can checkout the file tree at any point in the project's history (using the various revision specification methods - more fun), though more frequently it is the tree at the tip of one of the branches. There is also a whole load of stuff to discover about 'remote tracking branches' (which are local branches which track a remote system), and realising that they are actually local, and just part of your local history tree, and it's only a naming convention.... -- Philip -- You received this message because you are subscribed to the Google Groups "Git for human beings" group. To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.