I know many people use git, so wanted to share a neat tip I figured out this
morning that lets you graft the pre-split history into the post-split
repositories. I'm using git 1.7.1, not sure how new these features are. Here
are the steps:

1) Check out the git repos from git.apache.org into git/hadoop-common,
git/hadoop-mapreduce, and git/hadoop-hdfs

2) Set up the common repo as an "alternate object store" for mr and hdfs:

$ echo "/path/to/git/hadoop-common/.git/objects" >
/path/to/git/hadoop-hdfs/.git/objects/info/alternates
$ echo "/path/to/git/hadoop-common/.git/objects" >
/path/to/git/hadoop-mapreduce/.git/objects/info/alternates
This allows you to look at hashes from common from within your MR or HDFS
repos. Note that if you move the paths later you'll have to update this
file!

3) Set up grafts for the beginning of MR/HDFS history to the pre-split
commit in common:
echo 546d96754ffee3142bcbbf4563c624c053d0ed0d
6c16dc8cf2b28818c852e95302920a278d07ad0c >
git/hadoop-mapreduce/.git/info/grafts
echo 6a3ac690e493c7da45bbf2ae2054768c427fd0e1
6c16dc8cf2b28818c852e95302920a278d07ad0c
> git/hadoop-hdfs/.git/info/grafts

Now when you use commands like git log --follow or git blame, it will pick
up changes from pre-split as if it were one repository.

Hope others find this useful as well!
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to