+1 Nicholas Sze
----- Original Message ---- > From: Stack <st...@duboce.net> > To: hdfs-...@hadoop.apache.org > Cc: HBase Dev List <hbase-dev@hadoop.apache.org> > Sent: Thu, January 21, 2010 2:36:25 PM > Subject: [VOTE -- Round 2] Commit hdfs-630 to 0.21? > > I'd like to propose a new vote on having hdfs-630 committed to 0.21. > The first vote on this topic, initiated 12/14/2009, was sunk by Tsz Wo > (Nicholas), Sze suggested improvements. Those suggestions have since > been folded into a new version of the hdfs-630 patch. Its this new > version of the patch -- 0001-Fix-HDFS-630-0.21-svn-2.patch -- that I'd > like us to vote on. For background on why we -- the hbase community > -- think hdfs-630 important, see the notes below from the original > call-to-vote. > > I'm obviously +1. > > Thanks for you consideration, > St.Ack > > P.S. Regards TRUNK, after chatting with Nicholas, TRUNK was cleaned of > the previous versions of hdfs-630 and we'll likely apply > 0001-Fix-HDFS-630-trunk-svn-4.patch, a version of > 0001-Fix-HDFS-630-0.21-svn-2.patch that works for TRUNK that includes > the Nicholas suggestions. > > > On Mon, Dec 14, 2009 at 9:56 PM, stack wrote: > > I'd like to propose a vote on having hdfs-630 committed to 0.21 (Its already > > been committed to TRUNK). > > > > hdfs-630 adds having the dfsclient pass the namenode the name of datanodes > > its determined dead because it got a failed connection when it tried to > > contact it, etc. This is useful in the interval between datanode dying and > > namenode timing out its lease. Without this fix, the namenode can often > > give out the dead datanode as a host for a block. If the cluster is small, > > less than 5 or 6 nodes, then its very likely namenode will give out the dead > > datanode as a block host. > > > > Small clusters are common in hbase, especially when folks are starting out > > or evaluating hbase. They'll start with three or four nodes carrying both > > datanodes+hbase regionservers. They'll experiment killing one of the slaves > > -- datanodes and regionserver -- and watch what happens. What follows is a > > struggling dfsclient trying to create replicas where one of the datanodes > > passed us by the namenode is dead. DFSClient will fail and then go back to > > the namenode again, etc. (See > > https://issues.apache.org/jira/browse/HBASE-1876 for more detailed > > blow-by-blow). HBase operation will be held up during this time and > > eventually a regionserver will shut itself down to protect itself against > > dataloss if we can't successfully write HDFS. > > > > Thanks all, > > St.Ack