Grant Ingersoll wrote:
Hi,

I want to put Hadoop into my webapp and have it start up when the servlet starts up. Based on the shell scripts, I think I would need to do the following:

1. Package the appropriate libraries and config files (hadoop-site, slaves, etc.) into the webapp

2. (based on start-all.sh)
a. Start the namenode: org.apache.hadoop.dfs.NameNode (need to look into what is in the main() method to make sure I construct/invoke this correctly)
     b. Start the datanode:  org.apache.hadoop.dfs.DataNode
     c. Start the Jobtracker:  org.apache.hadoop.mapred.JobTracker
     d. Start the TaskTracker: org.apache.hadoop.mapred.TaskTracker

I am new to Hadoop, so is this reasonable?  What am I missing?


You need to make sure that you run only single instance of namenode and jobtracker per cluster. Then, you can run one datanode and one tasktracker per cluster node.



Also, is it possible to dynamically register slave nodes? I have been looking a little bit at zeroconf/bonjour network stuff and was wondering if it could be used to bring resources online automatically (would limit the nodes to a subnet, but that is fine for my needs).

Yes, just point them at the same namenode (for datanodes) and jobtracker (for tasktrackers), they will join the cluster automatically. The conf/slaves file is just for the initial startup of the cluster.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to