Despite its name, the conf/masters file defines on which machines Hadoop will start secondary NameNodes in multi-node cluster. In some case, this is just the master machine. The primary NameNode and the JobTracker will always be the machines on which you run the bin/start-dfs.sh and bin/start-mapred.sh scripts, respectively (the primary NameNode and the JobTracker will be started on the same machine if you run bin/start-all.sh). Note that you can also start an Hadoop daemon manually on a machine via bin/hadoop-daemon.sh start [namenode | secondarynamenode | datanode | jobtracker | tasktracker], which will not take the conf/masters and conf/slaves files into account.
Here are more details regarding the conf/masters file, taken from the Hadoop HDFS user guide: The secondary NameNode merges the fsimage and the edits log files periodically and keeps edits log size within a limit. It is usually run on a different machine than the primary NameNode since its memory requirements are on the same order as the primary NameNode. The secondary NameNode is started by bin/start-dfs.sh on the nodes specified in conf/masters file. On Tue, Aug 7, 2012 at 8:42 AM, Karthik Kambatla <ka...@cloudera.com> wrote: > My understanding is that conf/masters is used only by the bin/ or sbin/ > scripts - like start-*.sh, while the conf property is used by the Hadoop > daemons running on that particular node. > > Hope that helps, > Karthik > > On Mon, Aug 6, 2012 at 5:32 PM, Momina Khan <momina.a...@gmail.com> wrote: > > > Hi, > > I am trying to setup a hadoop cluster and I was wondering what is > > difference in specifying the Jobtracker IP in *mapred.job.tracker* in * > > mapred-site.xml* and noting the same IP in *conf/masters* file? Do I need > > to do both or just one. If I need to do both, is there a difference in > how > > the two are used? > > > > thankx > > momina > > > -- Don't Grow Old, Grow Up... :-)