some details can be found here appendDesign3.pdf<https://issues.apache.org/jira/secure/attachment/12445209/appendDesign3.pdf>
<https://issues.apache.org/jira/secure/attachment/12445209/appendDesign3.pdf> Thanh On Mon, Jan 3, 2011 at 2:49 AM, Sean Bigdatafun <sean.bigdata...@gmail.com>wrote: > I'd like to understand how HDFS handle Datanode failure gracefully. Let's > suppose a replication factor of 3 is used in HDFS for this discussion. > > > After 'DataStreamer' receives a list of Datanodes A, B, C for a block, it > starts pulling data packets off the 'data queue' and putting it onto 'ack > queue' after sending them off the wire to those Datanodes (using a pipeline > mechansim Client -> A -> B -> C). If the Datanode B crashes during the > writing, why the client need to put the data packets in the 'ack queue' > back to the 'data queue'? (how can the client guarantee the order of resent > packet on Datanode A after all?) > I guess I have not fully understood the write failure handling mechanism > yet. Can someone give a detailed explanation? > > Thanks, > -- > --Sean > > > > > > -- > --Sean > > >