On Feb 12, 2009, at 2:54 AM, Deepak wrote:
Hi, We're running Hadoop cluster on 4 nodes, our primary purpose of running is to provide distributed storage solution for internal applications here in TellyTopia Inc. Our cluster consists of non-identical nodes (one with 1TB another two with 3 TB and one more with 60GB) while copying data on HDFS we noticed that node with 60GB storage ran out of disk-space and even balancer couldn't balance because cluster was stopped. Now my questions are 1. Is Hadoop is suitable for non-identical cluster nodes?
Yes. Our cluster has between 60GB and 40TB on our nodes. The majority have around 3TB.
2. Is there any way to automatically balancing of nodes?
We have a cron script which automatically starts the Balancer. It's dirty, but it works.
3. Why Hadoop cluster stops when one node ran our of disk?
That's not normal. Trust me, if that was always true, we'd be perpetually screwed :)
There might be some other underlying error you're missing... Brian
Any futher inputs are appericiapted! Cheers, Deepak TellyTopia Inc.