Kihwal Lee commented on HDFS-12070:

It sure will be better if we can close the file right away.  The design doc 
specified that one more stage be added for safety as the state of stage-2 
failed replica is unknown. Let's examine whether any "unknown state" causes an 
issue after being excluded in the commit/close.

If any stage-2 failure occurs before updating the gen stamp, there will be no 
issue. So we can safely assume that closing right away will create no problems 
for early stage-2 failure cases.  After a series of checks, 

1) the in memory gen stamp of the RUR replica is updated.
2) the meta file is renamed accordingly.
3) the data file is truncated if necessary
4) the size is set in the RUR replica in memory.
5) finalize the replica (move under the finalized dir, create new ReplicaInfo), 
and then add to the replica map.
6) check replica files (moves succeeded, but double checking).
7) explicitly send a RECEIVED IBR. This is asynchronous, fire-and-forget.

- A failure in 1) or right after 1) is not an issue. It will stay as RUR until 
the DN restarts, so the replica won't be reported to the NN. After the DN 
restarts, it will turn into RWR/RBW and the genstamp will revert to what's on 
this. It will get cleaned up.

- A failure between  2) and 4) can update the on-disk gen stamp to the 
committed one, but the replica will stay as RUR.

- A failure between 5) and 6) might leave the on-disk data inconsistent and 
unusable. If fails before any rename/move, the files remain in the rbw 
directory and RUR in memory. The rest is same as the above case. If only one 
rename fails, the in-memory state won't reflect the on-disk state. Since it 
stays as RUR, it won't be mixed into the normal block locations. Upon DN 
restart, the replica will not be loaded.  If fails after successful renames, 
the replica will be loaded and reported as FINALIZED when the DN restarts, at 
which point it will be an excess replica. No effect on data consistency or 

- A failure in 7) causes temporary under-replication. Closing right away does 
not make it any worse than the retry approach.

In summary, it is safe to close right away.

> Failed block recovery leaves files open indefinitely and at risk for data loss
> ------------------------------------------------------------------------------
>                 Key: HDFS-12070
>                 URL: https://issues.apache.org/jira/browse/HDFS-12070
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha
>            Reporter: Daryn Sharp
>            Assignee: Kihwal Lee
>            Priority: Major
>         Attachments: HDFS-12070.0.patch, lease.patch
> Files will remain open indefinitely if block recovery fails which creates a 
> high risk of data loss.  The replication monitor will not replicate these 
> blocks.
> The NN provides the primary node a list of candidate nodes for recovery which 
> involves a 2-stage process. The primary node removes any candidates that 
> cannot init replica recovery (essentially alive and knows about the block) to 
> create a sync list.  Stage 2 issues updates to the sync list – _but fails if 
> any node fails_ unlike the first stage.  The NN should be informed of nodes 
> that did succeed.
> Manual recovery will also fail until the problematic node is temporarily 
> stopped so a connection refused will induce the bad node to be pruned from 
> the candidates.  Recovery succeeds, the lease is released, under replication 
> is fixed, and block is invalidated from the bad node.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to