I'd like to propose a vote on having hdfs-630 committed to 0.21 (Its already been committed to TRUNK).
hdfs-630 adds having the dfsclient pass the namenode the name of datanodes its determined dead because it got a failed connection when it tried to contact it, etc. This is useful in the interval between datanode dying and namenode timing out its lease. Without this fix, the namenode can often give out the dead datanode as a host for a block. If the cluster is small, less than 5 or 6 nodes, then its very likely namenode will give out the dead datanode as a block host. Small clusters are common in hbase, especially when folks are starting out or evaluating hbase. They'll start with three or four nodes carrying both datanodes+hbase regionservers. They'll experiment killing one of the slaves -- datanodes and regionserver -- and watch what happens. What follows is a struggling dfsclient trying to create replicas where one of the datanodes passed us by the namenode is dead. DFSClient will fail and then go back to the namenode again, etc. (See https://issues.apache.org/jira/browse/HBASE-1876 for more detailed blow-by-blow). HBase operation will be held up during this time and eventually a regionserver will shut itself down to protect itself against dataloss if we can't successfully write HDFS. Thanks all, St.Ack