[ http://issues.apache.org/jira/browse/HADOOP-163?page=comments#action_12412565 ]
Bryan Pendleton commented on HADOOP-163: ---------------------------------------- Sounds good - except, what "aborts"? The idea of the datanode staying operating, but reporting error and not accepting further blocks is probably better, but maybe you meant "abort the block write". The node's blocks should probably not be counted by the namenode, but still available as a source for replication. Also, staying up means that there are fewer timeouts - it used to be that, when writing large volumes into DFS, if one or more of your nodes was full, your writer would hit a periodic timeout as connections to the (full and constantly restarting) datanode were refused. Hitting a timeout because some fraction of all resources is overused is, of course, much *much* slower than continuing to stream. Further - if the datanode periodically re-tests if the error condition has lifted, it can more immediately begin contributing to the cluster productivity again. > If a DFS datanode cannot write onto its file system. it should tell the name > node not to assign new blocks to it. > ----------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-163 > URL: http://issues.apache.org/jira/browse/HADOOP-163 > Project: Hadoop > Type: Bug > Components: dfs > Versions: 0.2 > Reporter: Runping Qi > Assignee: Hairong Kuang > Fix For: 0.3 > > I observed that sometime, if a file of a data node is not mounted properly, > it may not be writable. In this case, any data writes will fail. The name > node should stop assigning new blocks to that data node. The webpage should > show that node is in an abnormal state. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
