Experiment with alternate settings for io.bytes.per.checksum for HFiles
-----------------------------------------------------------------------

                 Key: HBASE-2478
                 URL: https://issues.apache.org/jira/browse/HBASE-2478
             Project: Hadoop HBase
          Issue Type: Improvement
            Reporter: Kannan Muthukkaruppan


HDFS keeps a separate "checksum" file for every  block. By default, 
io.bytes.per.checksum is set at 512, and the checksums are 4 bytes... i.e. for 
every 512 bytes of data in the block we maintain a 4 byte checksum. For 4TB of 
data, for instance, that's about 31GB of checksum data.

A read that needs to read a small section (such as a 64k HFile block) from a 
HDFS block, especially on a cold access, is likely to end up doing two random 
disk reads--- one from the data file for the block and one from the checksum 
file.

A though was that instead of keeping a checksum for every 512 bytes, given  
that HBase will interact with HDFS on reads at the granularity of HBase block 
size (typically 64k, but smaller if compressed), should we consider keeping 
checksums at a coarser granularity (e.g, for every 8k bytes) for HFiles?  The 
advantage
with this would be that the checksum files would be much smaller (in proportion 
to the data) and the hot working set for "checksum data"  should fit better in 
the OS buffer cache (thus eliminating a good majority of the disk seeks for 
checksum data).

The intent of the JIRA is to experiment with different settings for 
"io.bytes.per.checksum" for HFiles. 

Note: For the previous example, of 4TB of data, with an io.bytes.per.checksum 
setting of 8k, the size of the checksum data would drop to about 2Gig.

Making the io.bytes.per.checksum too big might reduce the effectiveness of the 
checksum. So that needs to be taken into account as well in terms of 
determining a good value.

[For HLogs files, on the other hand, I suspect we would want to leave the 
checksum at finer granularity because my understanding is that if we are doing 
lots of small writes/syncs (as we do to HLogs), finer grained checksums are 
better (because the code currently doesn't do a rolling checksum, and needs to 
rewind to the nearest checksum block boundary and recomputed the checksum on 
every edit).]




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to