Re: Understanding Git Under The Hood: Trees

Andreas Ericsson Fri, 16 Aug 2013 02:12:57 -0700

On 2013-08-15 21:32, Erik Bernoth wrote:

On Thu, Aug 15, 2013 at 7:31 PM, Junio C Hamano <[email protected]> wrote:

While the last statement applies to other parts of the system, it is
not true for the in-core index design.  We always had a flat index,
and it is not cheating at all.  The original "tree" was also a flat
representation of everything under the sun, and hierarchical tree
objects came much later.


To some degree that revalidates my interpretation of Andreas'
statements. If I understand it correctly eacht time a shell command is
executed, which requires tree interaction, the corresponding tree is
read from filesystem to memory completely before anything is done?



More or less, yes, but please don't confuse "directory tree" with "git
tree". They're not the same. A directory tree can contain multiple
levels of directories, whereas a git tree can only contain a list

of objects. The index (aka "staging area") represents a directory tree,but when it gets stored on-disk a directory tree gets broken down into

as many git trees as is necessary.

The index is just a cache though. Until changes have been staged to it
in preparation for the next commit, it can be recreated exactly from

the currently checked out commit. As Junio pointed out, the index hasbeen flat from the very beginning. Don't confuse the index with the

git tree objects found in the object storage though, or the working tree
with git trees. They're really not the same.

To illustrate the differences, here's a few commands and what they do
and operate on, with regards to the three different kinds of trees that
have come up in this discussion.


Ignore everything git-related and only print the worktree:
  find .

Ignores everything index- and worktree-related and only print the root
git tree of the currently checked out commit. You won't see any
relative paths or directories in there; Just a list of trees and blobs:
  git cat-file -p $(git cat-file -p HEAD | sed -n 's/^tree //p;q')


List staged files only, regardless of what you have in the worktree or
what the latest commit looks like. This will look pretty much like the
last command, but with files located in subdirectories as well, and an
additional field where the "index-state" is stored:
  git ls-files -s


> So
> if I git-add a file, the whole index is read first, then the memory
> object is changed and then the resulting change is written to disk
> bottom-up from the point of view of the tree?
>

When you git-add a file, we read in the index, update it with the new
contents of the file you pointed to, or add the new file to it if the
file isn't known to us since before. We also add the blob to the
object store and write out the new tree(s) to the object store as well.
Then we write out the new index, and then we're done. We do all that
bottom up, as you say, or the object store will be inconsistent after
we started writing root objects but before we're done writing leaf
objects.

For a simple "git-add", that's it, and you'll now see "git status" list
files as added to the index without being committed. They're what we
call "staged" at this point.

If you also do "git commit" after having done "git-add", we write out a
commit object, pointing to its parent commit and the root tree we
created in the "git-add" stage. "git cat-file -p HEAD" will give you an
idea of how that looks.

--
Andreas Ericsson                   [email protected]
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Understanding Git Under The Hood: Trees

Reply via email to