I know many people use git, so wanted to share a neat tip I figured out this morning that lets you graft the pre-split history into the post-split repositories. I'm using git 1.7.1, not sure how new these features are. Here are the steps:
1) Check out the git repos from git.apache.org into git/hadoop-common, git/hadoop-mapreduce, and git/hadoop-hdfs 2) Set up the common repo as an "alternate object store" for mr and hdfs: $ echo "/path/to/git/hadoop-common/.git/objects" > /path/to/git/hadoop-hdfs/.git/objects/info/alternates $ echo "/path/to/git/hadoop-common/.git/objects" > /path/to/git/hadoop-mapreduce/.git/objects/info/alternates This allows you to look at hashes from common from within your MR or HDFS repos. Note that if you move the paths later you'll have to update this file! 3) Set up grafts for the beginning of MR/HDFS history to the pre-split commit in common: echo 546d96754ffee3142bcbbf4563c624c053d0ed0d 6c16dc8cf2b28818c852e95302920a278d07ad0c > git/hadoop-mapreduce/.git/info/grafts echo 6a3ac690e493c7da45bbf2ae2054768c427fd0e1 6c16dc8cf2b28818c852e95302920a278d07ad0c > git/hadoop-hdfs/.git/info/grafts Now when you use commands like git log --follow or git blame, it will pick up changes from pre-split as if it were one repository. Hope others find this useful as well! -Todd -- Todd Lipcon Software Engineer, Cloudera