There is definitely something to be said for developing via TDD as Lohit mentioned.

Hadoop has an extensive set of tools for writing unit tests that run on simulated clusters (see http://www.cloudera.com/blog/2008/12/16/testing-hadoop/ for an excellent tutorial). This will save you time in the long run because your testing can be contributed as well as the actual patch and there's no need to muck about with configuring clusters, manually starting datanodes, etc.

Actually needing a cluster to test or develop patches against is pretty rare and indicative of a problem somewhere else.

-Jakob



On Mar 4, 2009, at 11:08 AM, Raghu Angadi wrote:

Ajit Ratnaparkhi wrote:
Hi,
thanks for your help.
I tried the above mentioned script(one mentioned by Raghu), but whenever i
execute it, following message gets displayed,
*datanode running as process <process_id>. Stop it first*.
I am starting the single node cluster by command bin/start-dfs.sh first,
after which i am executing the above mentioned script to start second
datanode.

Did you try to do what the error message asks you to? Better still, you should try to find where the message is coming from. I realize this is not particularly a useful reply for a user but for a developer, I hope it is.

I just wrote the example script in the mail editor. I did not test it.. may be 'export' before setting HADOOP_* env variables in the script is required. Currently I use a different (a bit less elegant) method for starting multiple nodes. When I switch to this method, I will post the script.

better still, post your script once you get it to working.

Raghu.

I also tried giving seperate changed configuration from a seperate directory
for config by executing command,
*bin/hadoop-daemons.sh --config <config-directory-path> start datanode*
Still it gives same message as above.
also in this thread before Ramya mentioned about DataNodeCluster.java. This will help, but I am not getting how to execute this class. Can you please
help regarding this.
thanks,
-Ajit.
On Thu, Feb 26, 2009 at 6:43 PM, Raghu Angadi <rang...@yahoo- inc.com> wrote:
You can run with a small shell script. You need to override couple of
environment and config variables.

something like :

run_datanode () {
      DN=$2
      HADOOP_LOG_DIR=logs$DN
      HADOOP_PID_DIR=$HADOOP_LOG_DIR
      bin/hadoop-daemon.sh $1 datanode \
        -Dhadoop.tmp.dir=/some/dir/dfs$DN \
        -Ddfs.datanode.address=0.0.0.0:5001$DN \
        -Ddfs.datanode.http.address=0.0.0.0:5008$DN \
        -Ddfs.datanode.ipc.address=0.0.0.0:5002$DN
}

You can start second datanode like : run_datanode start 2

Pretty useful for testing.

Raghu.


Ajit Ratnaparkhi wrote:

Raghu,

Can you please tell me how to run multiple datanodes on one machine.

thanks,
-Ajit.

On Thu, Feb 26, 2009 at 9:23 AM, Pradeep Fernando <[email protected]
wrote:
Raghu,
I guess you are asking if it would be more convenient if one had access
to
a

larger cluster for development.

exactly.....

I have access to many machines and clusters.. but about 99% of my

development happens using single machine for testing. I would guess that

is

true for most of the Hadoop developers.

well this is the answer I was looking for....  :D
seems to be I have enough resources to contribute to this project.
Thanks a lot raghu.

regards,
Pradeep Fernando.




Reply via email to