[ 
https://issues.apache.org/jira/browse/HDFS-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-1233.
-------------------------------

    Resolution: Won't Fix

This is a known deficiency, don't think anyone has plans to fix it. Any cluster 
that has multiple disks per DN likely has multiple DNs too.

> Bad retry logic at DFSClient
> ----------------------------
>
>                 Key: HDFS-1233
>                 URL: https://issues.apache.org/jira/browse/HDFS-1233
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client
>    Affects Versions: 0.20.1
>            Reporter: Thanh Do
>
> - Summary: failover bug, bad retry logic at DFSClient, cannot failover to the 
> 2nd disk
>  
> - Setups:
> + # available datanodes = 1
> + # disks / datanode = 2
> + # failures = 1
> + failure type = bad disk
> + When/where failure happens = (see below)
>  
> - Details:
> The setup is:
> 1 datanode, 1 replica, and each datanode has 2 disks (Disk1 and Disk2).
>  
> We injected a single disk failure to see if we can failover to the
> second disk or not.
>  
> If a persistent disk failure happens during createBlockOutputStream
> (the first phase of the pipeline creation) (e.g. say DN1-Disk1 is bad),
> then createBlockOutputStream (cbos) will get an exception and it
> will retry!  When it retries it will get the same DN1 from the namenode,
> and then DN1 will call DN.writeBlock(), FSVolume.createTmpFile,
> and finally getNextVolume() which a moving volume#.  Thus, on the
> second try, the write will be successfully go to the second disk.
> So essentially createBlockOutputStream is wrapped in a
> do/while(retry && --count >= 0). The first cbos will fail, the second
> will be successful in this particular scenario.
>  
> NOW, say cbos is successful, but the failure is persistent.
> Then the "retry" is in a different while loop.
> First, hasError is set to true in RP.run (responder packet).
> Thus, DataStreamer.run() will go back to the loop:
> while(!closed && clientRunning && !lastPacketInBlock).
> Now this second iteration of the loop will call
> processDatanodeError because hasError has been set to true.
> In processDatanodeError (pde), the client sees that this is the only datanode
> in the pipeline, and hence it considers that the node is bad! Although 
> actually
> only 1 disk is bad!  Hence, pde throws IOException suggesting
> all the datanodes (in this case, only DN1) in the pipeline is bad.
> Hence, in this error, the exception is thrown to the client.
> But if the exception, say, is catched by the most outer while loop
> do-while(retry && --count >= 0), then this outer retry will be
> successful then (as suggested in the previous paragraph).
>  
> In summary, if in a deployment scenario, we only have one datanode
> that has multiple disks, and one disk goes bad, then the current
> retry logic at the DFSClient side is not robust enough to mask the
> failure from the client.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
> Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to