Re: [GNU-linux-libre] The "Free" Kernel In Debian Squeeze

Alexandre Oliva Sat, 01 Jan 2011 05:21:53 -0800

On Jan  1, 2011, Richard Stallman <[email protected]> wrote:

> Maintain a table of the correspondence between version identifiers
> in Torvalds' repository and version identifiers in ours.


That's what would be prohibitive to maintain by hand, and for anyone
willing to use our repository as a base to port third party's branches
into to pay manual attention to, when git could take care of it all, and
it does have smarts to take care of *almost* all of it.


The algorithm you described is already implemented in git rebase, except
for the note-taking.  It even pauses for manual conflict resolution, as
needed.

Now, maintaining these notes by hand is prohibitive.  Two-three years
ago (2.6.24-2.6.27), Linux averaged about 120 commits per day, up from
about 75 commits per day the year before (2.6.20-2.6.23), trending up.
http://www.schoenitzer.de/lks/lks_en.html


Even if maintaining this by hand wasn't prohibitive, you have to
understand how git pulls, merges and rebases operate, and then hopefully
you'll agree with me that there's little point in providing a repository
that offers no significant advantage over taking our release tarballs or
running our deblob scripts on upstream tarballs or repositories, and
that it would only make sense to make it a git repository if it can
easily interoperate with third parties' repositories using the git
workflow.

There are basically two kinds of pull operations (those that “cvs
update” from some external repository): merge and rebase.  pull can
actually be decomposed in a git fetch followed by a git merge or a git
rebase.  fetch contacts a remote server and brings the updates to an
upstream branch in the local repository, then merge or rebase integrate
the changes into a local branch.  The difference between them is in how
the integration is performed.

Consider that upstream history looks like this (each hex number
identifies a commit, an arrow points from child to parent):

  ab <- 17 <- 33 <- c7

Your local branch contains the following commits:

  ab <- 17 <- 45

Note that the latest commit is a local change you made, on top of an
older state of the upstream branch.

Now say you run git pull (merge).  Your branch will now contain:

         +- 33 <- c7 <-+
         v             |
  ab <- 17            d1
         ^             |
         +---- 45 <----+

d1 is a commit merge, whose parents are both your previous commit and
the upstream commit that was merged, and the history of your branch will
now carry the fork.  (Such forks may also appear upstream, as they would
for anyone who happened to import your branch if you published it at
this time).


If, instead, you'd said git pull --rebase, your branch would now
contain:

  ab <- 17 <- 33 <- c7 <- 67

and there'd be nothing relating 45 to 67.


By now you've probably already guessed that we don't want merge commits
whose ancestors would carry non-Free bits into our history.

You've probably also realized that we don't want pull --rebase either,
for this would replay our local changes onto upstream's non-Free
history.  Since we're going to amend very early commits in Linux (in
fact, the very first commit needs amending, because the git repository
was created out of 2.6.11 IIRC), our initial amended commit would
conflict with the upstream blob-ful initial commit, so all the commits
in our branch would look different, and git would try to replay them
all, with conflicts and much pain, and the result would carry the
non-Free bits in its ancestry.  That's no good.

What we want is a rebase, but the other way round: we want to rebase
upstream changes into our branch, so that we end up with:

  ab <- 17 <- 45 <- 62 <- b5

where 62 is the rebased (possibly rewritten) c7, and b5 is the rebased
(possibly rewritten) 67.

If we ask git to rebase upstream into our local branch, it will pretty
much give us what we want, at least for the first time.  The second
time, it will try to replay all changes since we diverged (at 17, in the
example, or at the initial commit, in Linux-libre).  git rebase attempts
to detect identical changes and not re-apply them, but if conflicts
arise, it seems to me that they may have to be resolved every time.
That's...  undesirable, for the understatement of the year ;-)

The good news is that git rebase apparently can give us what we want for
subsequent updates, although with a somewhat convoluted procedure: it
can apparently be told to rebase a “range” of commits onto an unrelated
branch, so we could follow this procedure:

  git fetch # advances upstream/master
  git rebase --onto libre prev-upstream upstream/master
  # clean up new commits as needed
  git tag -f upstream/master prev-upstream
  git push publish libre # none of upstream's history goes out

I'm not entirely sure this takes us all the way, sadly.  rebase will
flatten merge commits unless performed in interactive mode, and I'm not
sure about the implications of that.

OTOH, git filter-branch is *designed* to rewrite history preserving
merging and all.  I'm not sure it can be done incrementally, but I think
I see a somewhat convoluted procedure to work around that potential
limitation.  Even then, this does little to solve the actual problem of
using our published repository as part of others' git workflow: the
resulting branch, just like the result of the rebase strategy above, is
a branch that's completely incompatible with upstream, i.e., for third
parties, this means no git pull for updates from upstream, and no
combination with third-party's branches based on upstream.

> I don't know the git interface, and I don't know whether the goal of
> "not breaking it" is feasible.  But I urge you not to worry about it
> too much.

Why?  What's the point of rushing to implement something that's no
better than using our release tarballs or our deblobbing scripts, rather
than releasing something that will actually be useful?

>     Creating a repository the way you suggest would make it very difficult
>     for us (or anyone else) to bring in any changes that are later installed
>     in Linus' tree, regardless of whether they need cleaning up.

> Not difficult at all -- I explained just above how to do it.

Manually performing operations that git is perfectly capable of
performing is not only difficult, it's ridiculously more difficult.  If
it was such that only we had to do it, it might be acceptable.  But if
every user has to do it, it is nonsensical.  And that's what we would
get with an incompatible repository.

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist      Red Hat Brazil Compiler Engineer

Re: [GNU-linux-libre] The "Free" Kernel In Debian Squeeze

Reply via email to