Paul, You can inspect the data used by your new nodes after the balancer operation runs. "hadoop dfsadmin -report" should tell you detailed stats about each of the DNs, or look at /fsck
(Note: by default, the balancer operation may be bandwidth limited, for performance reasons and may take a while to happen -- although this is configurable) On Fri, Jul 1, 2011 at 10:42 AM, Paul Rimba <paul.ri...@gmail.com> wrote: > Hey Matei, > what if you do the bin/hadoop-daemon.sh start tasktracker > bin/hadoop-daemon.sh start datanode. > Does it move the old data to the new slave? > I run that scenario a couple of times and run the start-balancer.sh. It > always says that the cluster is balanced. Does it mean that the has been > spread out? > Thanks > Paul > On Fri, Jul 1, 2011 at 2:05 PM, Matei Zaharia <ma...@eecs.berkeley.edu> > wrote: >> >> You can have a new TaskTracker or DataNode join the cluster by just >> starting that daemon on the slave (e.g. bin/hadoop-daemon.sh start >> tasktracker) and making sure it is configured to connect to the right >> JobTracker or NameNode (through the mapred.job.tracker and fs.default.name >> properties in the config files). The slaves file is only used for the >> bin/start-* and bin/stop-* scripts, but Hadoop doesn't look at it at >> runtime. There may be other similar files that it can look at though, such >> as a blacklist, but I think that in the default configuration you can just >> launch the daemon and it will work. >> Note that if you add a new DataNode, Hadoop won't automatically move old >> data to it (to spread out the across the cluster) unless you run the HDFS >> rebalancer, at least as far as I know. >> Matei >> On Jun 30, 2011, at 8:56 PM, Paul Rimba wrote: >> >> Hey there, >> i am trying to add a new datanode/tasktracker to a currently running >> cluster. >> Is this feasible? And if yes, how do i change the masters, slaves and >> dfs.replication(in hdfs-site.xml) configuration? >> can i add the new slave to the slaves configuration file while the cluster >> is running? >> i found thisĀ ./bin/hadoop dfs -setrep -w 4 /path/to/file command to change >> the dfs.replication on the fly. >> Is there a better way to do it? >> >> >> Thank you for your kind attention. >> >> Kind Regards, >> Paul > > -- Harsh J