[Hadoop Wiki] Update of "GitAndHadoop" by SteveLoughran

Apache Wiki Mon, 07 Dec 2009 08:14:02 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "GitAndHadoop" page has been changed by SteveLoughran.
The comment on this change is: more on branching.
http://wiki.apache.org/hadoop/GitAndHadoop?action=diff&rev1=4&rev2=5

--------------------------------------------------

  == Before you begin ==
  
   1. You need a copy of git on your system. Some IDEs ship with Git support; 
this page assumes you are using the command line.
-  1. You need a copy of ant 1.7+ on your system for the builds themselves.
+  1. You need a copy of Ant 1.7+ on your system for the builds themselves.
-  1. You need to be online for your first checkout and build.
+  1. You need to be online for your first checkout and build, and any 
subsequent build which needs to download new artifacts from the central JAR 
repositories.
   1. You need to set Ant up so that it works with any proxy you have. This is 
documented by [[http://ant.apache.org/manual/proxy.html |the ant team]].
  
  
@@ -35, +35 @@

  }}}
  The total download is well over 100MB, so the initial checkout process works 
best when the network is fast. Once downloaded, Git works offline.
  
+ == Forking onto GitHub ==
+ 
+ You can create your own fork of the ASF project, put in branches and stuff as 
you desire. GitHub prefer you to explicitly fork their copies of Hadoop.
+ 
+  1. Create a githib login at http://github.com/ ; Add your public SSH keys
+  1. Go to http://github.com/apache and search for the Hadoop and other apache 
projects you want (avro is handy alongside the others)
+  1. For each project, fork. This gives you your own repository URL which you 
can then clone locally with {{{git clone}}}
+  1. For each patch, branch.
+ 
  == Building the source ==
  
  You need to tell all the Hadoop modules to get a local JAR of the bits of 
Hadoop they depend on. You do this by making sure your Hadoop version does not 
match anything public, and to use the "internal" repository of locally 
published artifacts.
@@ -55, +64 @@

  hadoop-mapred.version=${version}
  }}}
  
+ The {{{resolvers}}} property tells Ivy to look in the local maven artifact 
repository for versions of the Hadoop artifacts; if you don't set this then 
only published JARs from the central repostiory will get picked up.
+ 
+ The version property, and descendents, tells Hadoop which version of 
artifacts to create and use. Set this to something different (ideally ahead of) 
what is being published, to ensure that your own artifacts are picked up.
+ 
  Next, symlink this file to every Hadoop module. Now a change in the file gets 
picked up by all three.
  {{{
- ln -s build.properties hadoop-common/build.properties
- ln -s build.properties hadoop-hdfs/build.properties
- ln -s build.properties hadoop-mapreduce/build.properties
+ pushd hadoop-common; ln -s build.properties ../build.properties; popd
+ pushd hadoop-hdfs; ln -s build.properties ../build.properties; popd
+ pushd hadoop-mapreduce; ln -s build.properties ../build.properties; popd
  }}}
  
- You are all set up to build.
+ You are now all set up to build.
  
  === Build Hadoop ===
  
@@ -72, +85 @@

  
  This Ant target not only builds the JAR files, it copies it to the local 
{{{${user.home}/.m2}}} directory, where it will be picked up by the "internal" 
resolver. You can check that this is taking place by running {{{ant 
ivy-report}}} on a project and seeing where it gets its dependencies.
  
- If there are problems, don't be afraid to {{{rm -rf 
~/.m2/repository/org/apache/hadoop}}}  and {{{rm -rf 
~/.ivy2/cache/org.apache.hadoop}}} to remove local copies of artifacts.
- 
  === Testing ===
  
  Each project comes with lots of tests; run {{{ant test}}} to run them. If you 
have made changes to the build and tests fail, it may be that the tests never 
worked on your machine. Build and test the unmodified source first. Then keep 
an eye on both the main source and any branch you make. A good way to do this 
is to give a Continuous Integration server such as Hudson this job: checking 
out, building and testing both branches.
  
+ == Branching ==
+ 
+ Hadoop makes it easy to branch. The recommended process for working with 
apache projects is: one branch per JIRA issue. That makes it easy to isolate 
development and track the development of each change. It does mean if you have 
your own branch that you release, one that merges in more than one issue, you 
have to invest some effort in merging everything in. Try not to make changes in 
different branches that are hard to merge.
+ 
+ One thing you need to look out for is making sure that you are building the 
different Hadoop projects together; that you have not published on one branch 
and built on another. This is because both Ivy and Maven publish artifacts to 
shared repository cache directories.
+ 
+  1. Don't be afraid to {{{rm -rf ~/.m2/repository/org/apache/hadoop}}}  and 
{{{rm -rf ~/.ivy2/cache/org.apache.hadoop}}} to remove local copies of 
artifacts.
+  1. Use different version properties in different branches to ensure that 
different versions are not accidentally picked up
+  1. Avoid using {{{latest.version}}} as the version marker in Ivy, as that 
gives you the last built.
+  1. Don't build/test different branches simultaneously, such as by running 
Hudson on your local machine while developing on the console. The trick here is 
bring up Hudson in a virtual machine, running against the Git repository on 
your desktop. Git lets you do this, which lets you run Hudson against your 
private branch.
+

[Hadoop Wiki] Update of "GitAndHadoop" by SteveLoughran

Reply via email to