Re: Recommendations (WAS -> Re: DFSClient errors during massive HBase load)

Andrew Purtell Sat, 17 Apr 2010 10:44:28 -0700

The short answer is you need more HDFS datanodes. 

It's a question of trying to do too much at peak load with too few cluster 
resources.


> Brief reminder: I have a small cluster, 3 regionservers
> (+datanodes), 1 master (+namenode).

> We preform a massive load of data into hbase every few
> minutes.

I think you can see how these two things are in conflict with each other. 

> 2010-04-17 10:08:07,270 WARN
> org.apache.hadoop.hdfs.DFSClient: DataStreamer
> Exception: java.io.IOException: Unable to create new
> block.

This is an (un)helpful message providing a fairly clear indication you need to 
increase the resources available in your cluster so it can deal with the peak 
loads you are imposing. 

The longer answer, at least for HBase, is HBASE-2183 (Ride Over Restart). My 
team at Trend Micro will, over the next couple of months, work to make HBase 
more resilient to HDFS layer problems. Currently HBase regions servers shut 
down in most cases when they take exceptions from the filesystem. It's a simple 
and effective strategy to avoid integrity and corruption problems. So when you 
see the region servers shut down on your cluster because you are overstressing 
HDFS, it is functioning correctly. However we want to go over the code paths 
which touch the filesystem and decide on a case by case basis if there is an 
alternative which can improve the availability of the service overall should 
something happen on a given arc. 

Best regards,

   - Andy

Re: Recommendations (WAS -> Re: DFSClient errors during massive HBase load)

Reply via email to