[ https://issues.apache.org/jira/browse/HDFS-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Lipcon resolved HDFS-1233. ------------------------------- Resolution: Won't Fix This is a known deficiency, don't think anyone has plans to fix it. Any cluster that has multiple disks per DN likely has multiple DNs too. > Bad retry logic at DFSClient > ---------------------------- > > Key: HDFS-1233 > URL: https://issues.apache.org/jira/browse/HDFS-1233 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client > Affects Versions: 0.20.1 > Reporter: Thanh Do > > - Summary: failover bug, bad retry logic at DFSClient, cannot failover to the > 2nd disk > > - Setups: > + # available datanodes = 1 > + # disks / datanode = 2 > + # failures = 1 > + failure type = bad disk > + When/where failure happens = (see below) > > - Details: > The setup is: > 1 datanode, 1 replica, and each datanode has 2 disks (Disk1 and Disk2). > > We injected a single disk failure to see if we can failover to the > second disk or not. > > If a persistent disk failure happens during createBlockOutputStream > (the first phase of the pipeline creation) (e.g. say DN1-Disk1 is bad), > then createBlockOutputStream (cbos) will get an exception and it > will retry! When it retries it will get the same DN1 from the namenode, > and then DN1 will call DN.writeBlock(), FSVolume.createTmpFile, > and finally getNextVolume() which a moving volume#. Thus, on the > second try, the write will be successfully go to the second disk. > So essentially createBlockOutputStream is wrapped in a > do/while(retry && --count >= 0). The first cbos will fail, the second > will be successful in this particular scenario. > > NOW, say cbos is successful, but the failure is persistent. > Then the "retry" is in a different while loop. > First, hasError is set to true in RP.run (responder packet). > Thus, DataStreamer.run() will go back to the loop: > while(!closed && clientRunning && !lastPacketInBlock). > Now this second iteration of the loop will call > processDatanodeError because hasError has been set to true. > In processDatanodeError (pde), the client sees that this is the only datanode > in the pipeline, and hence it considers that the node is bad! Although > actually > only 1 disk is bad! Hence, pde throws IOException suggesting > all the datanodes (in this case, only DN1) in the pipeline is bad. > Hence, in this error, the exception is thrown to the client. > But if the exception, say, is catched by the most outer while loop > do-while(retry && --count >= 0), then this outer retry will be > successful then (as suggested in the previous paragraph). > > In summary, if in a deployment scenario, we only have one datanode > that has multiple disks, and one disk goes bad, then the current > retry logic at the DFSClient side is not robust enough to mask the > failure from the client. > This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.