[
https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kihwal Lee updated HDFS-5438:
-----------------------------
Status: Patch Available (was: Open)
> Flaws in block report processing can cause data loss
> ----------------------------------------------------
>
> Key: HDFS-5438
> URL: https://issues.apache.org/jira/browse/HDFS-5438
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.2.0, 0.23.9
> Reporter: Kihwal Lee
> Assignee: Kihwal Lee
> Priority: Critical
> Attachments: HDFS-5438.trunk.patch
>
>
> The incremental block reports from data nodes and block commits are
> asynchronous. This becomes troublesome when the gen stamp for a block is
> changed during a write pipeline recovery.
> * If an incremental block report is delayed from a node but NN had enough
> replicas already, a report with the old gen stamp may be received after block
> completion. This replica will be correctly marked corrupt. But if the node
> had participated in the pipeline recovery, a new (delayed) report with the
> correct gen stamp will come soon. However, this report won't have any effect
> on the corrupt state of the replica.
> * If block reports are received while the block is still under construction
> (i.e. client's call to make block committed has not been received by NN),
> they are blindly accepted regardless of the gen stamp. If a failed node
> reports in with the old gen stamp while pipeline recovery is on-going, it
> will be accepted and counted as valid during commit of the block.
> Due to the above two problems, correct replicas can be marked corrupt and
> corrupt replicas can be accepted during commit. So far we have observed two
> cases in production.
> * The client hangs forever to close a file. All replicas are marked corrupt.
> * After the successful close of a file, read fails. Corrupt replicas are
> accepted them during commit and valid replicas are marked corrupt afterward.
--
This message was sent by Atlassian JIRA
(v6.1#6144)