On Sat, Apr 4, 2009 at 10:25 PM, Foss User <[email protected]> wrote:
> > On Sun, Apr 5, 2009 at 10:27 AM, Todd Lipcon <[email protected]> wrote: > > On Sat, Apr 4, 2009 at 3:47 AM, Foss User <[email protected]> wrote: > >> > >> 1. Should I edit conf/slaves on all nodes or only on name node? Do I > >> have to edit this in job tracker too? > >> > > > > The conf/slaves file is only used by the start/stop scripts (e.g. > > start-all.sh). This script is just a handy wrapper that sshs to all of > the > > slaves to start the datanode/tasktrackers on those machines. So, you > should > > edit conf/slaves on whatever machine you tend to run those administrative > > scripts from, but those are for convenience only and not necessary. You > can > > start the datanode/tasktracker services on the slave nodes manually and > it > > will work just the same. > > What are the commands to start data node and task tracker on a slave > machine? > With the vanilla hadoop distribution, $HADOOP_HOME/bin/hadoop-daemon.sh start datanode (or tasktracker) Or, if you're using the Cloudera Distribution for Hadoop, you should start it using standard linux services (/etc/init.d/hadoop-datanode start). > >> 5. When I add a new slave to the cluster later, do I need to run the > >> namenode -format command again? If I have to, how do I ensure that > >> existing data is not lost. If I don't have to, how will the folders > >> necessary for HDFS be created in the new slave machine? > >> > > > > > > No - after starting the slave, the NN and JT will start assigning > > blocks/jobs to the new slave immediately. The HDFS directories will be > > created when you start up the datanode - you just need to ensure that the > > directory configured in dfs.data.dir exists and is writable by the hadoop > > user. > > All these days when I was working, dfs.data.dir was something like > /tmp/hadoop-hadoop/dfs/data. But this directory never existed. Only > /tmp existed and it was writable by Hadoop. On starting the namenode, > on the master, this directory was created automatically on the masters > as well as all slaves. > Starting just the namenode won't create the datadirs on the slaves. If you used the start-dfs.sh script, that sshed into the slaves and started the datanode on each of them, which did create the data dirs. > > So, are you correct in saying that directory configured in > dfs.data.dir should already exist. Isn't it more like directory > configured in dfs.data.dir would be automatically created if it > doesn't exist? Only thing is that the hadoop user should have the > permission to create it. Am I right? > Correct - sorry if I wasn't clear on that. The hadoop user needs to be able to perform the equivalent of "mkdir -p" on the dfs.data.dir path. Having the dfs.data.dir in /tmp is a default setting that you should definitely change, though. /tmp is cleared by a cron job on most systems as well as at boot. -Todd
