On Jan 1, 2011, Richard Stallman <[email protected]> wrote: > Maintain a table of the correspondence between version identifiers > in Torvalds' repository and version identifiers in ours.
That's what would be prohibitive to maintain by hand, and for anyone willing to use our repository as a base to port third party's branches into to pay manual attention to, when git could take care of it all, and it does have smarts to take care of *almost* all of it. The algorithm you described is already implemented in git rebase, except for the note-taking. It even pauses for manual conflict resolution, as needed. Now, maintaining these notes by hand is prohibitive. Two-three years ago (2.6.24-2.6.27), Linux averaged about 120 commits per day, up from about 75 commits per day the year before (2.6.20-2.6.23), trending up. http://www.schoenitzer.de/lks/lks_en.html Even if maintaining this by hand wasn't prohibitive, you have to understand how git pulls, merges and rebases operate, and then hopefully you'll agree with me that there's little point in providing a repository that offers no significant advantage over taking our release tarballs or running our deblob scripts on upstream tarballs or repositories, and that it would only make sense to make it a git repository if it can easily interoperate with third parties' repositories using the git workflow. There are basically two kinds of pull operations (those that “cvs update” from some external repository): merge and rebase. pull can actually be decomposed in a git fetch followed by a git merge or a git rebase. fetch contacts a remote server and brings the updates to an upstream branch in the local repository, then merge or rebase integrate the changes into a local branch. The difference between them is in how the integration is performed. Consider that upstream history looks like this (each hex number identifies a commit, an arrow points from child to parent): ab <- 17 <- 33 <- c7 Your local branch contains the following commits: ab <- 17 <- 45 Note that the latest commit is a local change you made, on top of an older state of the upstream branch. Now say you run git pull (merge). Your branch will now contain: +- 33 <- c7 <-+ v | ab <- 17 d1 ^ | +---- 45 <----+ d1 is a commit merge, whose parents are both your previous commit and the upstream commit that was merged, and the history of your branch will now carry the fork. (Such forks may also appear upstream, as they would for anyone who happened to import your branch if you published it at this time). If, instead, you'd said git pull --rebase, your branch would now contain: ab <- 17 <- 33 <- c7 <- 67 and there'd be nothing relating 45 to 67. By now you've probably already guessed that we don't want merge commits whose ancestors would carry non-Free bits into our history. You've probably also realized that we don't want pull --rebase either, for this would replay our local changes onto upstream's non-Free history. Since we're going to amend very early commits in Linux (in fact, the very first commit needs amending, because the git repository was created out of 2.6.11 IIRC), our initial amended commit would conflict with the upstream blob-ful initial commit, so all the commits in our branch would look different, and git would try to replay them all, with conflicts and much pain, and the result would carry the non-Free bits in its ancestry. That's no good. What we want is a rebase, but the other way round: we want to rebase upstream changes into our branch, so that we end up with: ab <- 17 <- 45 <- 62 <- b5 where 62 is the rebased (possibly rewritten) c7, and b5 is the rebased (possibly rewritten) 67. If we ask git to rebase upstream into our local branch, it will pretty much give us what we want, at least for the first time. The second time, it will try to replay all changes since we diverged (at 17, in the example, or at the initial commit, in Linux-libre). git rebase attempts to detect identical changes and not re-apply them, but if conflicts arise, it seems to me that they may have to be resolved every time. That's... undesirable, for the understatement of the year ;-) The good news is that git rebase apparently can give us what we want for subsequent updates, although with a somewhat convoluted procedure: it can apparently be told to rebase a “range” of commits onto an unrelated branch, so we could follow this procedure: git fetch # advances upstream/master git rebase --onto libre prev-upstream upstream/master # clean up new commits as needed git tag -f upstream/master prev-upstream git push publish libre # none of upstream's history goes out I'm not entirely sure this takes us all the way, sadly. rebase will flatten merge commits unless performed in interactive mode, and I'm not sure about the implications of that. OTOH, git filter-branch is *designed* to rewrite history preserving merging and all. I'm not sure it can be done incrementally, but I think I see a somewhat convoluted procedure to work around that potential limitation. Even then, this does little to solve the actual problem of using our published repository as part of others' git workflow: the resulting branch, just like the result of the rebase strategy above, is a branch that's completely incompatible with upstream, i.e., for third parties, this means no git pull for updates from upstream, and no combination with third-party's branches based on upstream. > I don't know the git interface, and I don't know whether the goal of > "not breaking it" is feasible. But I urge you not to worry about it > too much. Why? What's the point of rushing to implement something that's no better than using our release tarballs or our deblobbing scripts, rather than releasing something that will actually be useful? > Creating a repository the way you suggest would make it very difficult > for us (or anyone else) to bring in any changes that are later installed > in Linus' tree, regardless of whether they need cleaning up. > Not difficult at all -- I explained just above how to do it. Manually performing operations that git is perfectly capable of performing is not only difficult, it's ridiculously more difficult. If it was such that only we had to do it, it might be acceptable. But if every user has to do it, it is nonsensical. And that's what we would get with an incompatible repository. -- Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
