[ https://issues.apache.org/jira/browse/HDFS-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Lipcon resolved HDFS-1234. ------------------------------- Resolution: Duplicate Resolved by HDFS-630 > Datanode 'alive' but with its disk failed, Namenode thinks it's alive > --------------------------------------------------------------------- > > Key: HDFS-1234 > URL: https://issues.apache.org/jira/browse/HDFS-1234 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.20.1 > Reporter: Thanh Do > > - Summary: Datanode 'alive' but with its disk failed, Namenode still thinks > it's alive > > - Setups: > + Replication = 1 > + # available datanodes = 2 > + # disks / datanode = 1 > + # failures = 1 > + Failure type = bad disk > + When/where failure happens = first phase of the pipeline > > - Details: > In this experiment we have two datanodes. Each node has 1 disk. > However, if one datanode has a failed disk (but the node is still alive), the > datanode > does not keep track of this. From the perspective of the namenode, > that datanode is still alive, and thus the namenode gives back the same > datanode > to the client. The client will retry 3 times by asking the namenode to > give a new set of datanodes, and always get the same datanode. > And every time the client wants to write there, it gets an exception. > This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.