The short answer is you need more HDFS datanodes. It's a question of trying to do too much at peak load with too few cluster resources.
> Brief reminder: I have a small cluster, 3 regionservers > (+datanodes), 1 master (+namenode). > We preform a massive load of data into hbase every few > minutes. I think you can see how these two things are in conflict with each other. > 2010-04-17 10:08:07,270 WARN > org.apache.hadoop.hdfs.DFSClient: DataStreamer > Exception: java.io.IOException: Unable to create new > block. This is an (un)helpful message providing a fairly clear indication you need to increase the resources available in your cluster so it can deal with the peak loads you are imposing. The longer answer, at least for HBase, is HBASE-2183 (Ride Over Restart). My team at Trend Micro will, over the next couple of months, work to make HBase more resilient to HDFS layer problems. Currently HBase regions servers shut down in most cases when they take exceptions from the filesystem. It's a simple and effective strategy to avoid integrity and corruption problems. So when you see the region servers shut down on your cluster because you are overstressing HDFS, it is functioning correctly. However we want to go over the code paths which touch the filesystem and decide on a case by case basis if there is an alternative which can improve the availability of the service overall should something happen on a given arc. Best regards, - Andy