Hi,
I have a pseudo-distributed Hadoop cluster setup, and I'm currently hoping to put about 100 gigs of files on it to play around with. I got a unix box at work no one else is using for this, and running a df -h, I get:
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             7.9G  2.4G  5.2G  31% /
none                  3.8G     0  3.8G   0% /dev/shm
/dev/sdb              414G  210M  393G   1% /mnt

Alright, so /mnt looks quite big and seems like a good place to store my hdfs files. I go ahead and create a folder named hadoop-data there and set the following in hdfs-site.xml:

<property>
<!-- where hadoop stores its files (datanodes only) -->
<name>dfs.name.dir</name>
<value>/mnt/hadoop-data</value>
</property>

After a bit of troubleshooting, I restart the cluster and try to put a couple of test files onto HDFS. Doing an ls of hadoop-data, I see:

$ ls
current  image  in_use.lock  previous.checkpoint

OK, things look good. Time to try uploading some real data. Now, here's where the problem arises. If I add a 10mb dummy file to hadoop-data through regular unix and run df -h, I see that the used space of /mnt goes up exactly 10mb. But, when I start running a big dump of data through:

hadoop fs -put ~/hadoop_playground/data2/data2/ /data/

I notice that running df -h seems to put the data in completely the wrong location! Note that below, only the usage of /dev/sda1 has increased. /mnt has not moved.

Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             7.9G  3.4G  4.2G  45% /
none                  3.8G     0  3.8G   0% /dev/shm
/dev/sdb              414G  210M  393G   1% /mnt

So, what gives? Anyone have any clue how my files are seemingly both put in the hadoop-data folder, but take up space elsewhere? I could see this likely being a Unix issue, but I figured I'd ask here just in case it's not, since I'm pretty stumped.

Cheers,
Eli

Reply via email to