Re: Contributing to hadoop

Jakob Homan Wed, 04 Mar 2009 11:42:46 -0800

There is definitely something to be said for developing via TDD asLohit mentioned.

Hadoop has an extensive set of tools for writing unit tests that runon simulated clusters (see http://www.cloudera.com/blog/2008/12/16/testing-hadoop/for an excellent tutorial). This will save you time in the long runbecause your testing can be contributed as well as the actual patchand there's no need to muck about with configuring clusters, manuallystarting datanodes, etc.

Actually needing a cluster to test or develop patches against ispretty rare and indicative of a problem somewhere else.


-Jakob



On Mar 4, 2009, at 11:08 AM, Raghu Angadi wrote:

Ajit Ratnaparkhi wrote:
Hi,
thanks for your help.
I tried the above mentioned script(one mentioned by Raghu), butwhenever i
execute it, following message gets displayed,
*datanode running as process <process_id>. Stop it first*.
I am starting the single node cluster by command bin/start-dfs.shfirst,
after which i am executing the above mentioned script to start second
datanode.
Did you try to do what the error message asks you to? Better still,you should try to find where the message is coming from. I realizethis is not particularly a useful reply for a user but for adeveloper, I hope it is.
I just wrote the example script in the mail editor. I did not testit.. may be 'export' before setting HADOOP_* env variables in thescript is required. Currently I use a different (a bit less elegant)method for starting multiple nodes. When I switch to this method, Iwill post the script.
better still, post your script once you get it to working.

Raghu.
I also tried giving seperate changed configuration from a seperatedirectory
for config by executing command,
*bin/hadoop-daemons.sh --config <config-directory-path> startdatanode*
Still it gives same message as above.
also in this thread before Ramya mentioned aboutDataNodeCluster.java. Thiswill help, but I am not getting how to execute this class. Can youplease
help regarding this.
thanks,
-Ajit.
On Thu, Feb 26, 2009 at 6:43 PM, Raghu Angadi <rang...@yahoo-inc.com> wrote:
You can run with a small shell script. You need to override coupleof
environment and config variables.

something like :

run_datanode () {
      DN=$2
      HADOOP_LOG_DIR=logs$DN
      HADOOP_PID_DIR=$HADOOP_LOG_DIR
      bin/hadoop-daemon.sh $1 datanode \
        -Dhadoop.tmp.dir=/some/dir/dfs$DN \
        -Ddfs.datanode.address=0.0.0.0:5001$DN \
        -Ddfs.datanode.http.address=0.0.0.0:5008$DN \
        -Ddfs.datanode.ipc.address=0.0.0.0:5002$DN
}

You can start second datanode like : run_datanode start 2

Pretty useful for testing.

Raghu.


Ajit Ratnaparkhi wrote:
Raghu,
Can you please tell me how to run multiple datanodes on onemachine.
thanks,
-Ajit.

On Thu, Feb 26, 2009 at 9:23 AM, Pradeep Fernando <[email protected]
wrote:
Raghu,
I guess you are asking if it would be more convenient if one hadaccess
to
a
larger cluster for development.
exactly.....

I have access to many machines and clusters.. but about 99% of my
development happens using single machine for testing. I wouldguess that
is
true for most of the Hadoop developers.
well this is the answer I was looking for....  :D
seems to be I have enough resources to contribute to this project.
Thanks a lot raghu.

regards,
Pradeep Fernando.

Re: Contributing to hadoop

Reply via email to