RE: How to Start Hadoop Cluster from source code in Eclipse

Mahajan, Neeraj Thu, 21 Jun 2007 09:41:28 -0700

There are two sepearete issues you are asking here:
1. How to modify/add to haddop code and execute the changed -
Eclipse is just an IDE, it doesn't matter whether you use eclipse or
some other editor.
I have been using eclipse. What I do is modify the code using eclipse
and then run "ant jar" in the root folder of hadoop (you could also
configure this to work directly from eclipse). This would regenerate the
jars and put them in build/ folder. Now you can either copy these jars
into hadoop root folder (removing "dev" in their name) so that they
replace the original jars or modify the scripts in bin/ to point to the
newly generated jars.


2. How to debug using a IDE -
This page gives a high-level intro to debugging hadoop -
http://wiki.apache.org/lucene-hadoop/HowToDebugMapReducePrograms
According to me, there are two ways you can debug hadoop programs: Run
hadoop in local mode and debug in process in the IDE or run hadoop in
distributed mode and remote debug using IDE.

The first way is easy. In the bin/hadoop script at the end there is a
exec command, instead of that put a echo command and run your program.
You can see what the paramters the script passes while starting hadoop.
Use these same parameters in the IDE and you can debug hadoop. Remember
to make change to the conf files so that hadoop runs in local mode. To
be more specific, you will have to set the program arguemnts, VM
arguments and add an entry in the classpath pointing to the conf folder.

The second method is compilcated. You will have to modify the scripts
and put in some extra params like "-Xdebug
-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=<port>" for the
java command. Specify the <port> of you choice in it. On the server
where you are running both the namenode/jobnode there will be a conflict
as the same port would be specified. So you will have to do some
intelligent scripting to take care of this. Once the java processes
start you can attach eclipse debugger to that machine's <port> and set
breakpoints. Till this part you can debug all the things before map
reduce tasks. Mapp reduce tasks run in separate process, for debugging
them you will have to figure out yourself.

The best way is to debug using the first approach (as the above link
says). I think by that approach you can fix any map-reduce related
problems and for other purely distributed kind of problems you can
follow the second approach.

~ Neeraj

-----Original Message-----
From: KrzyCube [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 21, 2007 2:08 AM
To: hadoop-user@lucene.apache.org
Subject: How to Start Hadoop Cluster from source code in Eclipse


Hi,all:

I am using Eclipse to View Hadoop source code , and i want to trace to
see how it works, I code a few code to call the FSClient  and when i
call into the RPC  object, it can not to be deep more .

So i just want to start cluster from source code , which i am holding
them in Eclipse now. 
I browse the start-*.sh , and find that it must start several threads ,
such as namenode , datanode,secondnamenode. i just don't know how to
figure out.

or is there any way to attach my code to a running process , just as the
gdb while we are debug c code 

Does any body ever use Eclipse to debug these source code , please give
some tip.

 

Thanks .
 

KrzyCube
--
View this message in context:
http://www.nabble.com/How-to-Start-Hadoop-Cluster-from-source-code-in-Ec
lipse-tf3957457.html#a11229322
Sent from the Hadoop Users mailing list archive at Nabble.com.

RE: How to Start Hadoop Cluster from source code in Eclipse

Reply via email to