wow! What a long mail! But I probably deserved it, quoting that lengthy
mail from Junio...
On Fri, 5 Aug 2005, Linus Torvalds wrote:
> On Fri, 5 Aug 2005, Johannes Schindelin wrote:
> > Tutorial says "cache" aka "index". Though technically, a cache
> > is the index file _plus_ the related objects in the object database.
> > git-update-cache.txt even makes the difference between the "index"
> > and the "directory cache".
> I think we should globally rename it to "index".
Totally agree. The index is a central concept. But let's keep in mind --
and make future Documentation/ readers do the same -- that the index,
without the referenced objects in the objects database, is only a
> The "directory cache" and later "cache" naming came from when I started
> doing the work - before git was even git at all, and had no backing store
> what-so-ever, I started out writing "cache.h" and "read-cache.c", and it
> was really first a trial at doing a totally SCM-neutral directory cache
> You don't even see that in the git revision history, because that was
> before git was self-hosting - the project was partly started to also work
> as possibly just a fast front-end to something that wasn't as fast (ie
> think something like a front-end to make "monotone" work better).
> So the "directory cache" and "cache" naming comes from that historical
> background: it was really started as a front-end cache, and in fact the
> ".git" directory was called ".dircache" initially. You can see some of
> that in the very earliest git releases: by then I had already done the
> backing store, and the thing was already called "git", but the "dircache"
> naming still remains in places.
> For example, here's my "backup" target in the initial checkin:
> backup: clean
> cd .. ; tar czvf dircache.tar.gz dir-cache
> which shows that not only did I call the resulting tar file "dircache",
> the directory I was developing stuff in was called "dir-cache" as well ;)
> The index obviously ended up doing a lot more, and especially with the
> different stages it became much more than just a directory cache thing:
> it's integral to how git does the fast part of a merge. So we should call
> it "index" and edit out the old "cache" and "director cache" naming
I quoted this entirely, for a good reason: Linus, one day you really
should write a Wikibook about all the "small" projects you started. I
still remember the words "I'm doing a (free) operating system (just a
hobby, won't be big...". There's so much to be learnt about good
engineering. And people do want to add there anecdotes to it.
> > - the directory which corresponds to the top of the hierarchy
> > described in the index file; I've seen words like "working
> > tree", "working directory", "work tree" used.
> > The tutorial initially says "working tree", but then "working
> > directory". Usually, a directory does not include its
> > subdirectories, though. git-apply-patch-script.txt, git-apply.txt,
> > git-hash-object.txt, git-read-tree.txt
> > use "work tree". git-checkout-cache.txt, git-commit-tree.txt,
> > git-diff-cache.txt, git-ls-tree.txt, git-update-cache.txt contain
> > "working directory". git-diff-files.txt talks about a "working tree".
> I think we should use "working tree" throughout, since "working directory"
> is unix-speak for "pwd" and has a totally different meaning.
I hoped so much.
> > - An index file can be in "merged" or "unmerged" state. The
> > former is when it does not have anything but stage 0 entries,
> > the latter otherwise.
> I think the "unmerged" case should be mentioned in the "cache entry"
> thing, since it's really a per-entry state, exactly like "dirty/clean".
> Then, explaining a "unmerged index" as being an index file with some
> entries being unmerged makes more sense.
> As it is, the above "explains" an index file as being unmerged by talking
> about "stage 0 entries", which in turn haven't been explained at all.
That's right. We probably should copy a bit from git-read-tree.txt, or at
least reference it in the glossary.
> > - A "tree object" can be recorded as a part of a "commit
> > object". The tree object is said to be "associated with" the
> > commit object.
> > In diffcore.txt, "changeset" is used in place of "commit".
> We really should use "commit" throughout. ex-BK users sometimes lip into
> "changeset" (which in turn is probably because BK had these per-file
> commits too - deltas), but there's no point in the distinction in git. A
> commit is a commit.
That is, if you don't do "git-update-cache <single-file>" (which is not
possible with some porcelains).
Apart from that: I think that it is quite important to make the
distinction between a "commit" and a "commit object". Newbies (in that
case, people working with CVS are newbies to the concepts of git, too)
tend understand better what you say, if you make that distinction very
> > - The following objects are collectively called "tree-ish": a
> > tree object, a commit object, a tag object that resolves to
> > either a commit or a tree object, and can be given to
> > commands that expect to work on a tree object.
> > We could call this category an "ent".
> LOL. You are a total geek.
I take that as a compliment :-)
> > - The files under $GIT_DIR/refs record object names, and are
> > called "refs". What is under refs/heads/ are called "heads",
> > refs/tags/ "tags". Typically, they are either object names
> > of commit objects or tag objects that resolve to commit
> > objects, but a tag can point at any object.
> > The tutorial never calls them "refs", but instead "references".
> It might be worth saying explicitly that a reference is nothing but the
> same thing as a "object name" aka "sha1". And make it very clear that it
> can point to any object type, although commits tend to be the most common
> thng you want to reference. That then leads naturally into a very specific
> _subcase_ of refs, namely a "head":
Do not forget signed tags! Strictly said, these are references to
references which are signed.
> > - A "head" is always an object name of a commit, and marks the
> > latest commit in one line of development. A line of
> > development is often called a "branch". We sometimes use the
> > word "branch head" to stress the fact that we are talking
> > about a single commit that is the latest one in a "branch".
> > In the tutorial, the latter is used in reverse: it talks about a
> > "HEAD development branch" and a "HEAD branch".
Actually, I don't think it a good idea to talk about a "HEAD branch" or
"development branch". I'd prefer "branch".
> > I find it a little bit troublesome that $GIT_DIR/branches does not
> > really refer to a branch, but rather to a (possibly remote) repository.
> Yes, I find the $GIT_DIR/branches naming to be confusing too.
I don't know if we can hide it from the users, or if we should bite the
apple and rename it to "remotes/", or even better "repositories/".
> Jeff has been dragging me into the "local branches are good" camp, and
> these days I'm obviously a big believer.
I think that Jeff really deserves the credit for this. Yours truly was
convinced that one repository should hold one branch only. But I was
convinced otherwise, too.
> > - The act of finding out the object names recorded in "refs" a
> > different repository records, optionally updating a local
> > "refs" with their values, and retrieving the objects
> > reachable from them is called "fetching". Fetching immediately
> > followed by merging is called "pulling".
> > In that sense, git-http-pull would be more appropriately named
> > git-http-fetch, and analogous git-ssh-pull.
> > Also, git-pull-script.txt says "Pull and merge", contradicting this
> > definition.
> To confuse things even more, cogito calls a fetch "pull" and a pull
I really think this should be unified. Pasky?
> I personally think "fetch" is unambigious: it's just the act of fetching,
> with no "merge" activity at all. So we should use that.
> What to call a "fetch+merge" is a bit ambiguous. I obviously prefer
> "pull", but cogito disagrees, and you're right, "git-http-pull" and
> "git-ssh-pull" both really do just fetches.
Let's rename them before 1.0.
> But I think "update" isn't right either: to me, update would be the
> non-merging kind (ie I think "update" implies "refresh" which in turn
> implies a "fetch"-like behaviour).
> So I'd vote for making the suggested definition official: "fetch" means
> fetching the data, and "pull" means "fetch + merge".
This should be discussed. Obviously, I come from CVS and understand
"update" to be what cogito says it is. But then, it is also true that
CVS's usage of "update" is misleading, because it really does a merge, not
forcing the user to do a commit before merge (because that is not possible
in CVS). So basically, I agree: "pull" is unambiguous as far as I am
> > - a "pack" usually consists of two files: a file containing objects
> > in a compressed format, and an index to the first file. If the
> > pack is uncompressed at once (e.g. when git-clone is called), the
> > index is not necessary.
> > git-pack-objects calls this a "packed archive" first, but then reverts
> > to "pack". git-show-index.txt and git-verify-pack.txt call the .pack file
> > "packed GIT archive", and the index "idx file". git-unpack-objects.txt
> > calls the .pack file "pack archive".
> We should just call them packs. An archive can be multiple packs and lots
> of non-packed objects too.
And they can have pack indices which do not relate at all to the central
> > "type": one of the identifiers "commit","tree","tag" and "blob" describing
> > the type of an object.
> Yes. Some old docs may call this type a "tag", since I was really thinking
> in not in the SCM meaning at all, but in the _computer_architecture_
> meaning, where people usually call objects with enforced types "tagged".
> Ie from a computer architecture standpoint you can have "tagged memory" or
> "tagged pointers", and LISP machines are often implemented with the
> pointers containing the type ("tag") of the thing they point to (for
> example, the low two bits might be the "tag" on the pointer). So I was
> talking about "tagged objects" when I just meant that the type of the
> object was embedded in the object itself, the way tagged memory
> architectures work.
> In retrospect, that naming _really_ confused some people, I know I had
> trouble explaining git concepts to David Wheeler because I used "tagged
> objects" _not_ to mean a SCM style "tag", but to mean "typed objects".
> If somebody sees an old reference to "object tags", those should all be
> fixed to say "object types".
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html