[ https://issues.apache.org/jira/browse/HDFS-1237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Lipcon resolved HDFS-1237. ------------------------------- Resolution: Invalid If both DNs crash in a pipeline of 2 DNs, of course the pipeline does not recover. The likelihood of correlated failure of all nodes in a pipeline is very small since one of the replicas is offrack. Please reopen if you think there's _any_ action the client could take to recover when the entire pipeline has crashed. > Client logic for 1st phase and 2nd phase failover are different > --------------------------------------------------------------- > > Key: HDFS-1237 > URL: https://issues.apache.org/jira/browse/HDFS-1237 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client > Affects Versions: 0.20.1 > Reporter: Thanh Do > > - Setup: > number of datanodes = 4 > replication factor = 2 (2 datanodes in the pipeline) > number of failure injected = 2 > failure type: crash > Where/When failures happen: There are two scenarios: First, is when two > datanodes crash at the same time in the first phase of the pipeline. Second, > when two datanodes crash at the second phase of the pipeline. > > - Details: > > In this setting, we set the datanode's heartbeat message to be 1 second to > the namenode. > This is just to show that if the NN has declared a datanode dead, the > DFSClient will not > get that dead datanode from the server. Here's our observations: > > 1. If the two crashes happen during the first phase, > the client will wait for 6 seconds (which is enough time for NN to detect > dead datanodes in this setting). So after waiting for 6 seconds, the client > asks the NN again, and the NN is able to give a fresh two healthy datanodes. > and the experiment is successful! > > 2. BUT, If the two crashes happen during the second phase (e.g. renameTo). > The client *never waits for 6 secs* which implies that the logic of the client > for 1st phase and 2nd phase are different. What happens here, DFSClient gives > up and (we believe) it never falls back to the outer while loop to contact the > NN again. So the two crashes in this second phase are not masked properly, > and the write operation fails. > > In summary, scenario (1) is good, but scenario (2) is not successful. This > shows > a bad retry logic during the second phase. (We note again that we change > the setup a bit by setting the DN's hearbeat interval to 1 second. If we use > the default interval, scenario (1) will fail too because the NN will give the > client the same dead datanodes). > This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.