Doug Cutting wrote:
[ Moving discussion to hadoop-dev. -drc ]
Raghu Angadi wrote:
This is good validation how important ECC memory is. Currently HDFS
client deletes a block when it notices a checksum error. After moving
to Block level CRCs soon, we should make Datanode re-validate the
block before deciding to delete it.
It also emphasizes how important end-to-end checksums are. Data should
also be checksummed as soon as possible after it is generated, before it
has a chance to be corrupted.
Ideally, the initial buffer that stores the data should be small, and
data should be checksummed as this initial buffer is flushed.
In my implementation of block-level CRCs (does not affect
ChecksumFileSystem in HADOOP-928), we don't buffer checksum data at all.
As soon as io.bytes.per.checksum are written, checksum is written
directly to the backupstream. I have removed stream buffering in
multiple places in DFSClient. But it this is still affected by the
buffering issue you mentioned below.
In the
current implementation, the small checksum buffer is the second buffer,
the initial buffer is the larger, io.buffer.size buffer. To provide
maximum protection against memory errors, this situation should be
reversed.
This is discussed in https://issues.apache.org/jira/browse/HADOOP-928.
Perhaps a new issue should be filed to reverse the order of these
buffers, so that data is checksummed before entering the larger,
longer-lived buffer?
This reversal still does not help Block-level CRCs. We could remove
buffering all together in FileSystem level and let the FS
implementations to decide how to buffer.
Raghu.
Doug