Hi, I've been looking into converting the GNOME SVN repositories to git over the last couple of days. This email sums up the different approaches I've considered and experimented with, and at the end I describe the approach I'm recommending for doing the bulk conversion.
My first though was to just use git svn clone, or maybe even just grab the repos from git-mirror.gnome.org, drop them in place and call it a day. Using the git-mirror repositories was qucikly dismissed, since we'll want to use full names and some saner looking email addresses in the commit logs. Currently the git-mirror commits look like this in git (I'm using pango and Behdad as examples here): Author: behdad <beh...@123ab921-de25-0410-83fa-f409d9e86667> which is sufficient, but not pretty. So we'll need to do a re-import and use a username->fullname map to get something like this: Author: Behdad Esfahbod <[email protected]> We discussed different ways to generate the email address: try to dig up a real list of email addresses, but that's problematic since a lot of contributors have changed email addresses over time and contributing old commits to a new email address (ie employer) would be unfortunate. Another option was to try to automatically extract it from the ChangeLog or commit message but that's going to be messy and fragile. So in the end we're recommending generic [email protected] email addresses for the conversion. Whether it should be src, git, scm, vcs, dvcs or whatever is a wonderful bikeshedding subject. There's no requirement that the addresses should be working email addresses. I have src in my script now, and that's going to be hard to change... I mean, I'll have to edit the file and stuff, so I suggest we just go with src unless there's a really good reason not to. Ok, so to reconvert I started with git svn clone. This tool takes an author map that lets us map the usernames to the fullname as discussed above, but it has a couple of problems: 1) it creates an empty commit for branches and tags 2) it's very slow, even when you have everything locally. The empty commits come from the fact that creating a branch or tag in SVN requires doing a of the branch, which will introduce a new revision, with no changes to the source code. git svn clone doesn't filter this out, so we end up with a commit graph that looks something like this: http://people.freedesktop.org/~krh/pango-git-mirror-gitk.png ie, the PANGO_1_9_1 tag is sitting on a little branch on its own instead of pointing to a commit that is actually on the 1.9 branch. Being slow is less of a problem, but the faster we can convert the repositories the better. A little googling finds the svn-all-fast-export tool. This tool is written by Thiago Macieira from the KDE project, it uses the git fast-import feature and is designed to do a one-shot import of a big SVN repository (the KDE repository) and optionally split it into multiple git repositories in the process. It's very fast and detects and excludes the empty commits inherent in how SVN represents branching and tagging. And it's very fast - it imports evolution in half an hour. Now, the problem with this tool is that when comparing all tags in the original SVN repository and the new git repository, some of the tags differ. The git-mirror repositories match the SVN repositories tag for tag, so in this respect git svn clone is better. However, a little digging reveals that the tags that svn-all-fast-export doesn't handle are the tags that were carried over from the CVS to SVN conversion. That is still a problem with the svn-all-fast-export tool, but those SVN tags are actually badly broken. Take a look at http://svn.gnome.org/viewvc/gconf?view=revision&revision=1837 Which is supposed to be the 2.6.4 tag for GConf. Notice how the /tags/GCONF_2_6_4 directory is recorded as being copy from trunk, not from /branches/gnome-2-6. ChangeLog and many other files on the other hand come from the 2.6 branch replacing whatever was in the directory that was copied from trunk. Just as a reminder, this is what a tag is supposed to look like in SVN: http://svn.gnome.org/viewvc/pango?view=revision&revision=2736 (copied from trunk because pango doesn't have a 1.22 branch yet, so that's ok). So the SVN tags are badly messed up, and branches have a similar problem: http://svn.gnome.org/viewvc/gtk%2B?view=revision&revision=12227 The good news is that we can fix this. The process I've using now is a little complicated but it undoes the SVN import damage and preserves the history, tags and branches better than git svn clone. The basic idea is to redo the import from CVS directly to git and then replay the SVN activity on top of that. For the CVS to git import I'm using Keith Packard's parsecvs tool. We used Keith's tool for importing all of X.org (after splitting the monolithic CVS repository into all the components we have now) and we've used it for mesa, libdrm, hal and many other repositories on freedesktop.org. It's a great tool - it's fast and it handles all kinds special cases and brokeness usually encountered in old, hand trimmed CVS repositories (because, well, you should see the XFree86 CVS repository...). Then to import the SVN activity after the CVS to SVN conversion, I'm using the svn-all-fast-export tool. The only problem I've seen with that tool is that it got confused by the broken SVN tags and ended up with different tag contents than the SVN repo, but since I'm only using it for importing the part of history that originated in SVN, that shouldn't be a problem. Once we have a complete import, I'll put the repos up so people can help verifying them. I have a script to compare contents of all branches between a git repo and a svn repo, and I'm working on a tool to compare blame output to the extent that it's possible. For blame lines that map to a SVN commit we can map from the git commit to the git revno using the comment in the git commit, for for blame lines that reach into CVS commits, we can't easily determine if the commit that git gives us matches what SVN gives us. I think that we can compare the commit message to see if the commits are the same, but on the other hand, I don't know how much I trust the SVN import of CVS history now. So if we really want to verify this, we should consult CVS for those lines that go further back than SVN. I'll put my scripts in git somewhere once I get them to a point where they're generally useful and doesn't need too much handholding. Alright, this mail is already too long, but it sums up where we are with converting the repositories. I'll send out a heads up once we get some projects online. cheers, Kristian _______________________________________________ Gnome-infrastructure mailing list [email protected] http://mail.gnome.org/mailman/listinfo/gnome-infrastructure
