Re: [git-users] any suggestions for pruning all upstream branches after a github fork?

Konstantin Khomoutov Tue, 30 Sep 2014 06:04:16 -0700

On Mon, 29 Sep 2014 22:34:55 -0700
Sam Roberts <[email protected]> wrote:


[...]
> github, for reasons lost to me, gives you a snapshot of all upstream
> branches at time of fork, but not any new branches created upstream,
> nor does it ever delete them when upstream deletes, or give any way to
> synchronize... not to complain. :-)

I have a vague feeling you're falling in a trap rather common to Git
newcomers: people, especially those coming from a centralized VCS (such
as Subversion), tend to assume that their local repository and their
remote repository are the same thing, just the local one might
temporarily contain some unpushed changes, the remote one might
temporarily contain some unfetched changes, and otherwise they are
identical.  Having assumed that, they in turn assume that any `git pull`
(or `git fetch` or whatever) is supposed to make their local repository
be much like the one they've just "synchronized with", that is, to
bring in "new" branches (appeared on the remote since the last "sync")
and delete those disappeared there.
In short, these assumptions are completely wrong.

There do indeed exist DVCS systems maintaining the model I've just
described; off the top of my head I can name Fossil [1]: it has true
"synching", where each branch in a remote repo "means" the same thing
as in local, and should you push and/or fetch conflicting changes
when synching with the remote, this just creates multiple "heads" in a
branch -- meaning diverged histories, and so on.  More to this, each
repository contains a UUID which is checked when you try to exchange
history with a remote repository, and such exchange will fail right
away if UUIDs of the local and remote repos are not equal.

Git has been created with a very different mindset.

The main idea to absorb is that all repositories in Git are
independent.  Even when you have just a single remote repo configured
in your local one (typically it's named "origin"), and that repo is the
only one you ever communicate with, this really means nothing to Git,
and Git does not treat that remote repo in any special manner.
The independence of repositories has important repercussions.  The most
crucial is that a branch "master" in your local repo is not taken to
*mean* the same thing as a branch "master" in any other repo.
This is hard to grasp but bear with me.

The next idea to get hold on is that Git is fine with fetching commits
from any repository at all, and pushing them to any repository at all.
In other words, when it's about to exchange commits with another
repository, commit hierarchies is the only thing it considers: the
repositories do not have any "identity" instilled in them.  I mean,
you can take a local repository containing one of your weekend toy
projects and fetch there any branch from a repository maintaining the
Linux kernel source code -- it will work, even though, say, a branch
named "master" there contains commits in no way related to those on
your local branch "master".

Another feature to consider is that Git's branches are truly lightweight
and have no identity: a branch is just a pointer to its tip commit, and
all commits *reachable* from it do not record the fact they "are on
that branch" in any way (which is very different from, say, Mercurial).
It means at any time you might do something like:

  git checkout master
  git checkout -b foo
  git branch -d master

and have all commits which were reachable from "master" be now reachable
from "foo", and "master" is gone.

Let's now try to combine the pieces of this puzzle to see the picture.

* Commits in any repo form a graph (or a number of graphs).
* Branches are mere pointers to single commits of that graph.
  They're there only for convenience of referring to them
  and have very little semantics on their own.
* Same-named branches in different repositories mean different things.
* Git is able to exchange parts of the commit graphs it maintains
  with any other Git repository.  Its wire protocol will try to
  minimize the amount of objects to transfer, but when doing so
  it will only consider graphs of commits: it has no notion of
  "same project", "same branch" etc.
* No matter which repositories you exchange commits with, everything
  in your local repository is "truly yours"; no branch might ever be
  created, deleted or updated there unless you told Git to do so.

So how does Git implements this approach?
It does via two paradigms: remote branches and tracking branches.

Remote branch is *a bookmark* to the state of a branch in a remote
repository last time it was seen there.  They're normally
created/updated (but not deleted) when you do `git fetch`.
For instance, if a remote which is known locally as "origin" is fetched
from, and at that time it contains branches "foo" and "dev", Git will
normally create/update remote branches "origin/foo" and "origin/dev"
for you.  The crucial thing is that they are not "yours": they are mere
bookmarks to remote state.  You can't directly commit to these branches
(and this has no sense).  Why Git has it implemented this way?
Because it also allows you to fetch "foo" and "dev" from Joe's
repository, producing "joe/foo" and "joe/dev", and from Mary's,
producing "mary/foo" and "mary/dev", and from many others, and have
them all properly "namespaced" and ready for local inspection.

Still, no matter how many branches named "foo" you did fetch from
remote repos, your local branch "foo", if any, is yours and yours only:
it's not subject for updating when you fetch "foo" from any remote
(unless you tell Git directly to do so -- it's possible).

So when you did `git fetch` and "upstream" had new branches at that
time, you have remote branches for them -- run `git remote -r` to see
them.  Note that remote branches do not get deleted when they disappear
in their remote repository -- again simply because it's you who decide
when to delete what, and you might have legitimate reasons to still
have a handle on that old history (you can run `git remote prune` to
expunge such remote branches).

Now you might legitimately say that "most of the time" you want to have
your personal own local branches to closely follow those of "upstream".
That's indeed quite a typical case and Git make it easier to support
via its "tracking" mechanism: you might configure any local (yours)
branch to track a single remote branch.  When you clone, and Git
creates a single branch "master", it's already made track remote branch
"origin/master".  Tracking enables a whole slew of convenient shortcuts
including hints like "your branch X is N commits ahead of origin/X" etc.

In recent versions of Git, all you have to do to create a local branch
for an upstream's remote branch and start tracking it is do

  git checkout origin/foo

Git will create a local branch "foo" which will be set to track
"origin/foo".

Now it's time to read [2] and [3].

1. http://fossil-scm.org
2. http://git-scm.com/book/en/Git-Branching-Remote-Branches
3. http://longair.net/blog/2009/04/16/git-fetch-and-merge/

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [git-users] any suggestions for pruning all upstream branches after a github fork?

Reply via email to