Kihwal Lee created HDFS-8395:
--------------------------------

             Summary: Verify on-disk data after transferring block data
                 Key: HDFS-8395
                 URL: https://issues.apache.org/jira/browse/HDFS-8395
             Project: Hadoop HDFS
          Issue Type: Improvement
            Reporter: Kihwal Lee
            Priority: Critical


Currently the integrity of on-disk data is not checked during pipeline recovery 
or replication. The target in the pipeline-recovery-transfer can detect a 
corruption, but sometimes it is detected long after a corruption happens. (e.g. 
HDFS-4660) If multiple pipeline failures occur, delayed corruption detection 
can cause data loss.

During replications involving multiple destinations, if a middle node corrupts 
the data, it can cause the healthy source to be marked corrupt. Because of lack 
of full ack mechanism during replication, the corrupt replica will continue to 
be written and finalized. Now this replica will be source of further 
replication because the original source is marked corrupt. All subsequent 
replications of course fail and this results in a missing block.

By adding on-disk corruption detection to appropriate places, the situation can 
be improved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to