Re: HDFS on non-identical nodes

Brian Bockelman Sun, 15 Feb 2009 20:29:20 -0800


On Feb 15, 2009, at 3:21 AM, Deepak wrote:

Thanks Brain and Chen!

I finally sort that out why cluster is being stopped after running out
of space. Its because of master failure due to disk space.

Regarding automatic balancer, I guess in our case, rate of copying is
faster than balancer rate, we found balancer do start but couldn't
perform its job.

There are parameters you can set which control how quickly thebalancer is allowed to copy files about.

Nevertheless, you shouldn't rely on it to work for anythingperformance critical -- you'll probably want to ensure there's enoughspace around to do your work in the short-term.


Brian

Anyways thanks for your help! It helped me sort out somethings.

Cheers,
Deepak

On Thu, Feb 12, 2009 at 5:32 PM, He Chen <air...@gmail.com> wrote:
I think you should confirm your balancer is still running. Do youchange
the threshold of the HDFS balancer? May be too large?

The balancer will stop working when meets 5 conditions:

1. Datanodes are balanced (obviously you are not this kind);
2. No more block to be moved (all blocks on unbalanced nodes arebusy or
recently used)
3. No more block to be moved in 20 minutes and 5 times consecutiveattempts
4. Another balancer is working
5. I/O exception
The default setting is 10% for each datanodes, for 1TB it is 100GB,for 3T
is 300GB, and for 60GB is 6GB

Hope helpful
On Thu, Feb 12, 2009 at 10:06 AM, Brian Bockelman <bbock...@cse.unl.edu>wrote:
On Feb 12, 2009, at 2:54 AM, Deepak wrote:

Hi,
We're running Hadoop cluster on 4 nodes, our primary purpose of
running is to provide distributed storage solution for internal
applications here in TellyTopia Inc.
Our cluster consists of non-identical nodes (one with 1TB anothertwo
with 3 TB and one more with 60GB) while copying data on HDFS we
noticed that node with 60GB storage ran out of disk-space and even
balancer couldn't balance because cluster was stopped. Now my
questions are

1. Is Hadoop is suitable for non-identical cluster nodes?
Yes. Our cluster has between 60GB and 40TB on our nodes. Themajority
have around 3TB.
2. Is there any way to automatically balancing of nodes?
We have a cron script which automatically starts the Balancer.It's dirty,
but it works.
3. Why Hadoop cluster stops when one node ran our of disk?
That's not normal. Trust me, if that was always true, we'd beperpetually
screwed :)

There might be some other underlying error you're missing...

Brian


Any futher inputs are appericiapted!
Cheers,
Deepak
TellyTopia Inc.
--
Chen He
RCF CSE Dept.
University of Nebraska-Lincoln
US
--
Deepak
TellyTopia Inc.

Re: HDFS on non-identical nodes

Reply via email to