Grant Ingersoll wrote:
Hi,
I want to put Hadoop into my webapp and have it start up when the
servlet starts up. Based on the shell scripts, I think I would need
to do the following:
1. Package the appropriate libraries and config files (hadoop-site,
slaves, etc.) into the webapp
2. (based on start-all.sh)
a. Start the namenode: org.apache.hadoop.dfs.NameNode (need to
look into what is in the main() method to make sure I construct/invoke
this correctly)
b. Start the datanode: org.apache.hadoop.dfs.DataNode
c. Start the Jobtracker: org.apache.hadoop.mapred.JobTracker
d. Start the TaskTracker: org.apache.hadoop.mapred.TaskTracker
I am new to Hadoop, so is this reasonable? What am I missing?
You need to make sure that you run only single instance of namenode and
jobtracker per cluster. Then, you can run one datanode and one
tasktracker per cluster node.
Also, is it possible to dynamically register slave nodes? I have been
looking a little bit at zeroconf/bonjour network stuff and was
wondering if it could be used to bring resources online automatically
(would limit the nodes to a subnet, but that is fine for my needs).
Yes, just point them at the same namenode (for datanodes) and jobtracker
(for tasktrackers), they will join the cluster automatically. The
conf/slaves file is just for the initial startup of the cluster.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com