Doug Cutting wrote:
> So you want to use different sets of machines for dfs than for
> MapReduce? An easy way to achieve this is to install Hadoop separately
> and start dfs only there ('bin/hadoop-daemon.sh start namenode;
> bin/hadoop-daemons.sh start datanode', or use the new bin/start-dfs.sh
> script). Then, in your Nutch installation, start only the MapReduce
> daemons, using a different conf/slaves file ('bin/hadoop-daemon.sh start
> jobtracker; bin/hadoop-daemons.sh start tasktracker', or use the new
> bin/start-mapred.sh script). Just make sure that your Nutch
> installation is configured to talk to the same namenode as your Hadoop
> installation, and make sure that you don't run bin/start-all.sh from
> either installation. Does that make sense?
That makes complete sense. Conceptually, there is a daemon for dfs that says
on which machines DFS lives. All the MapReduce machines will always point to
this index server. However, to run MapReduce on subsets of the machines, I
use a different conf file. I like the fact that the scripts are now more
disjoint. Thanks.