There is definitely something to be said for developing via TDD as
Lohit mentioned.
Hadoop has an extensive set of tools for writing unit tests that run
on simulated clusters (see http://www.cloudera.com/blog/2008/12/16/testing-hadoop/
for an excellent tutorial). This will save you time in the long run
because your testing can be contributed as well as the actual patch
and there's no need to muck about with configuring clusters, manually
starting datanodes, etc.
Actually needing a cluster to test or develop patches against is
pretty rare and indicative of a problem somewhere else.
-Jakob
On Mar 4, 2009, at 11:08 AM, Raghu Angadi wrote:
Ajit Ratnaparkhi wrote:
Hi,
thanks for your help.
I tried the above mentioned script(one mentioned by Raghu), but
whenever i
execute it, following message gets displayed,
*datanode running as process <process_id>. Stop it first*.
I am starting the single node cluster by command bin/start-dfs.sh
first,
after which i am executing the above mentioned script to start second
datanode.
Did you try to do what the error message asks you to? Better still,
you should try to find where the message is coming from. I realize
this is not particularly a useful reply for a user but for a
developer, I hope it is.
I just wrote the example script in the mail editor. I did not test
it.. may be 'export' before setting HADOOP_* env variables in the
script is required. Currently I use a different (a bit less elegant)
method for starting multiple nodes. When I switch to this method, I
will post the script.
better still, post your script once you get it to working.
Raghu.
I also tried giving seperate changed configuration from a seperate
directory
for config by executing command,
*bin/hadoop-daemons.sh --config <config-directory-path> start
datanode*
Still it gives same message as above.
also in this thread before Ramya mentioned about
DataNodeCluster.java. This
will help, but I am not getting how to execute this class. Can you
please
help regarding this.
thanks,
-Ajit.
On Thu, Feb 26, 2009 at 6:43 PM, Raghu Angadi <rang...@yahoo-
inc.com> wrote:
You can run with a small shell script. You need to override couple
of
environment and config variables.
something like :
run_datanode () {
DN=$2
HADOOP_LOG_DIR=logs$DN
HADOOP_PID_DIR=$HADOOP_LOG_DIR
bin/hadoop-daemon.sh $1 datanode \
-Dhadoop.tmp.dir=/some/dir/dfs$DN \
-Ddfs.datanode.address=0.0.0.0:5001$DN \
-Ddfs.datanode.http.address=0.0.0.0:5008$DN \
-Ddfs.datanode.ipc.address=0.0.0.0:5002$DN
}
You can start second datanode like : run_datanode start 2
Pretty useful for testing.
Raghu.
Ajit Ratnaparkhi wrote:
Raghu,
Can you please tell me how to run multiple datanodes on one
machine.
thanks,
-Ajit.
On Thu, Feb 26, 2009 at 9:23 AM, Pradeep Fernando <[email protected]
wrote:
Raghu,
I guess you are asking if it would be more convenient if one had
access
to
a
larger cluster for development.
exactly.....
I have access to many machines and clusters.. but about 99% of my
development happens using single machine for testing. I would
guess that
is
true for most of the Hadoop developers.
well this is the answer I was looking for.... :D
seems to be I have enough resources to contribute to this project.
Thanks a lot raghu.
regards,
Pradeep Fernando.