[ https://issues.apache.org/jira/browse/HDFS-11755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nathan Roberts updated HDFS-11755: ---------------------------------- Status: Patch Available (was: Open) v1 of trunk patch. branch 2 will require a separate patch. > Underconstruction blocks can be considered missing > -------------------------------------------------- > > Key: HDFS-11755 > URL: https://issues.apache.org/jira/browse/HDFS-11755 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 3.0.0-alpha2, 2.8.1 > Reporter: Nathan Roberts > Assignee: Nathan Roberts > Attachments: HDFS-11755.001.patch > > > Following sequence of events can lead to a block underconstruction being > considered missing. > - pipeline of 3 DNs, DN1->DN2->DN3 > - DN3 has a failing disk so some updates take a long time > - Client writes entire block and is waiting for final ack > - DN1, DN2 and DN3 have all received the block > - DN1 is waiting for ACK from DN2 who is waiting for ACK from DN3 > - DN3 is having trouble finalizing the block due to the failing drive. It > does eventually succeed but it is VERY slow at doing so. > - DN2 times out waiting for DN3 and tears down its pieces of the pipeline, so > DN1 notices and does the same. Neither DN1 nor DN2 finalized the block. > - DN3 finally sends an IBR to the NN indicating the block has been received. > - Drive containing the block on DN3 fails enough that the DN takes it offline > and notifies NN of failed volume > - NN removes DN3's replica from the triplets and then declares the block > missing because there are no other replicas > Seems like we shouldn't consider uncompleted blocks for replication. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org