Client logic for 1st phase and 2nd phase failover are different ---------------------------------------------------------------
Key: HDFS-1237 URL: https://issues.apache.org/jira/browse/HDFS-1237 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.20.1 Reporter: Thanh Do - Setup: number of datanodes = 4 replication factor = 2 (2 datanodes in the pipeline) number of failure injected = 2 failure type: crash Where/When failures happen: There are two scenarios: First, is when two datanodes crash at the same time in the first phase of the pipeline. Second, when two datanodes crash at the second phase of the pipeline. - Details: In this setting, we set the datanode's heartbeat message to be 1 second to the namenode. This is just to show that if the NN has declared a datanode dead, the DFSClient will not get that dead datanode from the server. Here's our observations: 1. If the two crashes happen during the first phase, the client will wait for 6 seconds (which is enough time for NN to detect dead datanodes in this setting). So after waiting for 6 seconds, the client asks the NN again, and the NN is able to give a fresh two healthy datanodes. and the experiment is successful! 2. BUT, If the two crashes happen during the second phase (e.g. renameTo). The client *never waits for 6 secs* which implies that the logic of the client for 1st phase and 2nd phase are different. What happens here, DFSClient gives up and (we believe) it never falls back to the outer while loop to contact the NN again. So the two crashes in this second phase are not masked properly, and the write operation fails. In summary, scenario (1) is good, but scenario (2) is not successful. This shows a bad retry logic during the second phase. (We note again that we change the setup a bit by setting the DN's hearbeat interval to 1 second. If we use the default interval, scenario (1) will fail too because the NN will give the client the same dead datanodes). This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.