What Doug suggested makes sense. We should make the initial buffer size to
be bytesPerChecksum and the user defined buffer size to be the size of the
second buffer. This will also solve most of the problems that I described in
HADOOP-1124.

Hairong

-----Original Message-----
From: Raghu Angadi [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 16, 2007 11:39 AM
To: hadoop-dev@lucene.apache.org
Subject: Re: Many Checksum Errors

Doug Cutting wrote:
> [ Moving discussion to hadoop-dev.  -drc ]
> 
> Raghu Angadi wrote:
>> This is good validation how important ECC memory is. Currently HDFS 
>> client deletes a block when it notices a checksum error. After moving 
>> to Block level CRCs soon, we should make Datanode re-validate the 
>> block before deciding to delete it.
> 
> It also emphasizes how important end-to-end checksums are.  Data 
> should also be checksummed as soon as possible after it is generated, 
> before it has a chance to be corrupted.
> 
> Ideally, the initial buffer that stores the data should be small, and 
> data should be checksummed as this initial buffer is flushed.

In my implementation of block-level CRCs (does not affect ChecksumFileSystem
in HADOOP-928), we don't buffer checksum data at all. 
As soon as io.bytes.per.checksum are written, checksum is written directly
to the backupstream. I have removed stream buffering in multiple places in
DFSClient. But it this is still affected by the buffering issue you
mentioned below.

> In the
> current implementation, the small checksum buffer is the second 
> buffer, the initial buffer is the larger, io.buffer.size buffer.  To 
> provide maximum protection against memory errors, this situation 
> should be reversed.
> 
> This is discussed in https://issues.apache.org/jira/browse/HADOOP-928. 
> Perhaps a new issue should be filed to reverse the order of these 
> buffers, so that data is checksummed before entering the larger, 
> longer-lived buffer?

This reversal still does not help Block-level CRCs. We could remove
buffering all together in FileSystem level and let the FS implementations to
decide how to buffer.

Raghu.

> Doug


Reply via email to