On Tue, Jul 17, 2012 at 7:14 PM, N Keywal <[email protected]> wrote: >> I suppose as Todd suggests, we could do this client side. The extra >> state would complicate NN (making it difficult to get such a change > > After some iterations I came to a solution close to his proposition, > mentionned in my mail from yesterday. > To me we should fix this, and this includes HBASE-6401. The question > is mainly on which hdfs branch hbase would need it, as HDFS code > changed between the 1.0.3 release and the branch 2. HADOOP-8144 is > also important for people configuring the topology imho. >
Yes. Needs to be fixed for 1.0 and 2.0. This is ugly but could we have an HBase modified DFSClient load ahead of the hadoop one on CLASSPATH so we could get the fix in earlier? (Maybe its worth starting up an HBaseDFSClient effort if there are a list of particular behaviors such as the proposed reordering of replicas given us by the namenode, socket timeouts that differ dependent on who is opening the DFSInput/OutputStream, etc). We should work on getting fixes into hadoop meantime (because a hbase dfsclient won't help the intra-DN traffic timeouts). Its kinda silly the way we can repeatedly timeout on a DN we know elsewhere is dead while meantime data is offline. Its kind of an important one to fix I'd say. >> in). The API to mark a DN dead seems like a nice-to-have. Master or >> client could pull on it when it knows a server dead (not just the RS). > > Yes, there is a mechanism today to tell the NN to decommision a NN, > but it's complex, we need to write a file with the 'unwanted' nodes, > and we need to tell the NN to reload it. Not really a 'mark as dead" > function. Yeah. I remember that bit of messing now. Useful when ops want to decommission a node but does not serve this particular need. Its a bit of a tough one though in that the NN would have to 'trust' the client that is pulling on this new API that says a DN is down. St.Ack
