You need your dfs.data.dir configured to the bigger disks for data. That config targets the datanodes.
The one you've overriden is for the namenode's metadata, and hence the default dfs.data.dir config is writing to /tmp on your root disk (which is a bad thing, gets wiped after a reboot). On Mon, Feb 6, 2012 at 9:51 PM, Eli Finkelshteyn <[email protected]> wrote: > Hi, > I have a pseudo-distributed Hadoop cluster setup, and I'm currently hoping > to put about 100 gigs of files on it to play around with. I got a unix box > at work no one else is using for this, and running a df -h, I get: > Filesystem Size Used Avail Use% Mounted on > /dev/sda1 7.9G 2.4G 5.2G 31% / > none 3.8G 0 3.8G 0% /dev/shm > /dev/sdb 414G 210M 393G 1% /mnt > > Alright, so /mnt looks quite big and seems like a good place to store my > hdfs files. I go ahead and create a folder named hadoop-data there and set > the following in hdfs-site.xml: > > <property> > <!-- where hadoop stores its files (datanodes only) --> > <name>dfs.name.dir</name> > <value>/mnt/hadoop-data</value> > </property> > > After a bit of troubleshooting, I restart the cluster and try to put a > couple of test files onto HDFS. Doing an ls of hadoop-data, I see: > > $ ls > current image in_use.lock previous.checkpoint > > OK, things look good. Time to try uploading some real data. Now, here's > where the problem arises. If I add a 10mb dummy file to hadoop-data through > regular unix and run df -h, I see that the used space of /mnt goes up > exactly 10mb. But, when I start running a big dump of data through: > > hadoop fs -put ~/hadoop_playground/data2/data2/ /data/ > > I notice that running df -h seems to put the data in completely the wrong > location! Note that below, only the usage of /dev/sda1 has increased. /mnt > has not moved. > > Filesystem Size Used Avail Use% Mounted on > /dev/sda1 7.9G 3.4G 4.2G 45% / > none 3.8G 0 3.8G 0% /dev/shm > /dev/sdb 414G 210M 393G 1% /mnt > > So, what gives? Anyone have any clue how my files are seemingly both put in > the hadoop-data folder, but take up space elsewhere? I could see this likely > being a Unix issue, but I figured I'd ask here just in case it's not, since > I'm pretty stumped. > > Cheers, > Eli -- Harsh J Customer Ops. Engineer Cloudera | http://tiny.cloudera.com/about
