Re: Confusion about the Hadoop conf/slaves file

Doug Cutting Tue, 11 Apr 2006 10:13:31 -0700

Scott Simpson wrote:

Excuse my ignorance on this issue. Say I have 5 machines in my Hadoop
cluster and I only list two of them in the configuration file when I do a
"fetch" or a "generate". Won't this just store the data on the two nodes
since that is all I've listed for my crawling machines? I'm trying to crawl
on two but store my data across all five.

So you want to use different sets of machines for dfs than forMapReduce? An easy way to achieve this is to install Hadoop separatelyand start dfs only there ('bin/hadoop-daemon.sh start namenode;bin/hadoop-daemons.sh start datanode', or use the new bin/start-dfs.shscript). Then, in your Nutch installation, start only the MapReducedaemons, using a different conf/slaves file ('bin/hadoop-daemon.sh startjobtracker; bin/hadoop-daemons.sh start tasktracker', or use the newbin/start-mapred.sh script). Just make sure that your Nutchinstallation is configured to talk to the same namenode as your Hadoopinstallation, and make sure that you don't run bin/start-all.sh fromeither installation. Does that make sense?


Doug

Re: Confusion about the Hadoop conf/slaves file

Reply via email to