[jira] Created: (HDFS-1228) CRC does not match when retrying appending a partial block

Thanh Do (JIRA) Wed, 16 Jun 2010 21:24:53 -0700

CRC does not match when retrying appending a partial block
----------------------------------------------------------


                 Key: HDFS-1228
                 URL: https://issues.apache.org/jira/browse/HDFS-1228
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: data-node
    Affects Versions: 0.20.1
            Reporter: Thanh Do


- Summary: when appending to partial block, if is possible that
retrial when facing an exception fails due to a checksum mismatch.
Append operation is not atomic (either complete or fail completely).
 
- Setup:
+ # available datanodes = 2
+# disks / datanode = 1
+ # failures = 1
+ failure type = bad disk
+ When/where failure happens = (see below)
 
- Details:
Client writes 16 bytes to dn1 and dn2. Write completes. So far so good.
The meta file now contains: 7 bytes header + 4 byte checksum (CK1 -
checksum for 16 byte) Client then appends 16 bytes more, and let assume there 
is an
exception at BlockReceiver.receivePacket() at dn2. So the client knows dn2
is bad. BUT, the append at dn1 is complete (i.e the data portion and checksum 
portion
has been made to disk to the corresponding block file and meta file), meaning 
that the
checksum file at dn1 now contains 7 bytes header + 4 byte checksum (CK2 - this 
is
checksum for 32 byte data). Because dn2 has an exception, client calls 
recoverBlock and
starts append again to dn1. dn1 receives 16 byte data, it verifies if the 
pre-computed
crc (CK2) matches what we recalculate just now (CK1), which obviously does not 
match.
Hence an exception and retrial fails.
 
- a similar bug has been reported at
https://issues.apache.org/jira/browse/HDFS-679
but here, it manifests in different context.

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HDFS-1228) CRC does not match when retrying appending a partial block

Reply via email to