Todd Lipcon wrote:
>> If someone could write up some kind of "post-split transition guide"
>> on the Wiki I think that would be generally appreciated.
Here's something I wrote up for re-setting up Eclipse after the split in
a way that gives relatively seamless access to the various projects'
source code. If it's works for other people, it could be part of the wiki.
Setting up Eclipse post-dev:
I believe my Eclipse dev environment is set up for near-seamless
inter-project development following the Great Split of 2009. Here's how
I did it, step-by-step, with no guarantee as to orthodoxy, correctness
or appropriateness. Please let me know, or update, with corrections.
Obligatory works-on-my-machine and YMMV.
With these instructions, you'll be able to work on all three projects at
once. For instance, after the split, trying to look at the source of a
class defined in Common, such as Text, will show you the bytecode in the
included jar file. You can attach the source for review purposes, but
this still leaves the problem of eclipse only running the code from the
jar, rather than from the other project, in the event that you've
modified both. These instructions fix this problem so that any changes,
say in Common, are also applied to HDFS or MapReduce?.
These instructions assume svn and specifically svn from the command
line. They will work fine with git once those repos are set up and also
will work with the svn or git plugin from within eclipse.
1. Import each of the new repositories. Here're the directories that I
picked.
svn checkout https://svn.apache.org/repos/asf/hadoop/common/trunk
hadoop-common
svn checkout https://svn.apache.org/repos/asf/hadoop/hdfs/trunk hadoop-hdfs
svn checkout https://svn.apache.org/repos/asf/hadoop/mapreduce/trunk
hadoop-mapreduce
2. Start Eclipse. For each of the new directories, create a new java
project using that directory name, allowing Eclipse to do its standard
work of importing the project. It was previously necessary to change
Eclipse's default build directory (bin) to something else to keep it
from wiping out Hadoop's bin directory. At the moment, only the common
project has a bin directory. However, I still changed each of my eclipse
build directories (set on the second screen of the new project wizard)
to build/eclipse-files, just in case there is a bin added in the future
and as it's tidier.
3. Ensure that ANT_HOME is defined in Eclipse's environment variables
(Preferences -> Java -> Build Path -> Classpath Variables). Mine is set
to the standard /usr/share/ant
4. From the Navigator window, right click on the build.xml and (#2 Run
as ant build). Among the targets, specify compile,
compile-{common,hdfs,mapred}-test, and eclipse-files. Let Eclipse do its
work and after it's done, each of the projects should compile
successfully and be working correctly independently.
5. To allow the projects to call directly into each other's code, rather
than relying on the bundled libraries, connect each of the projects as
dependencies. For each project set up the natural dependency (hdfs
relies on common, mapred relies on common and hdfs). Right click on the
each project and go build path -> configure build path -> projects tab.
For HDFS, add common. For MapReduce, add Common and HDFS. Unfortunately,
you can't just add everything to everything, as Eclipse detects this as
a cycle and throws up errors.
6. Finally, in order to force Eclipse to look at the other projects'
source code, rather than the included jar files, remove the jar files
from the build path of each project, respectively. From HDFS, remove the
common(core) jar from the build path. From MapReduce, remove the hdfs
and common(core) jars from the build path. HDFS still has a dependency
on MapReduce for tests, and I couldn't get Eclipse to allow me to remove
the MapReduce jar from HDFS project; if anyone figures out how, please
update this document or let me know.
7. All should be well. Now, for instance, if you control-click on a
class defined in Common from hdfs (say Text), you are brought to its
definition in the common project, as expected. Also, if you modify code
in common and run a test from hdfs, it'll pull in the modified code from
common. This doesn't solve the problem of needing to generated patches
for each of the projects if your changes affect each of them, however.