[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5074:
-------------------------------

    Attachment: D1521.1.patch

dhruba requested code review of "[jira] [HBASE-5074] Support checksums in HBase 
block cache".
Reviewers: mbautin

  HFile is enhanced to store a checksum for each block. HDFS checksum 
verification is avoided while reading data into the block cache. On a checksum 
verification failure, we retry the file system read request with hdfs checksums 
switched on (thanks Todd).

  I have a benchmark that shows that it reduces iops on the disk by about 40%. 
In this experiment, the entire memory on the regionserver is allocated to the 
regionserver's jvm and the OS buffer cache size is negligible. I also measured  
negligible (<5%) additional cpu usage while using hbase-level checksums.

  The salient points of this patch:

  1. Each hfile's trailer used to have a 4 byte version number. I enhanced this 
so that these 4 bytes can be interpreted as a (major version number, minor 
version). Pre-existing hfiles have a minor version of 0. The new hfile format 
has a minor version of 1 (thanks Mikhail). The hfile major version remains 
unchanged at 2. The reason I did not introduce a new major version number is 
because the code changes needed to store/read checksums do not differ much from 
existing V2 writers/readers.

  2. Introduced a HFileSystem object which is a encapsulates the FileSystem 
objects needed to access data from hfiles and hlogs.  HDFS FileSystem objects 
already had the ability to switch off checksum verifications for reads.

  3. The majority of the code changes are located in hbase.io.hfie package. The 
retry of a read on an initial checksum failure occurs inside the hbase.io.hfile 
package itself.  The code changes to hbase.regionserver package are minor.

  4. The format of a hfileblock is the header followed by the data followed by 
the checksum(s). Each 16 K (configurable) size of data has a 4 byte checksum.  
The hfileblock header has two additional fields: a 4 byte value to store the 
bytesPerChecksum and a 4 byte value to store the size of the user data 
(excluding the checksum data). This is well explained in the associated 
javadocs.

  5. I added a test to test backward compatibility. I will be writing more unit 
tests that triggers checksum verification failures aggressively. I have left a 
few redundant log messages in the code (just for easier debugging) and will 
remove them in later stage of this patch. I will also be adding metrics on 
number of checksum verification failures/success in a later version of this 
diff.

  6. By default, hbase-level checksums are switched on and hdfs level checksums 
are switched off for hfile-reads. No changes to Hlog code path here.

TEST PLAN
  The default setting is to switch on hbase checksums for hfile-reads, thus all 
existing tests actually validate the new code pieces. I will be writing more 
unit tests for triggering checksum verification failures.

REVISION DETAIL
  https://reviews.facebook.net/D1521

AFFECTED FILES
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
  src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
  src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java
  src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java
  src/main/java/org/apache/hadoop/hbase/HConstants.java
  src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java
  src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java
  src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
  src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
  
src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/3171/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.

                
> support checksums in HBase block cache
> --------------------------------------
>
>                 Key: HBASE-5074
>                 URL: https://issues.apache.org/jira/browse/HBASE-5074
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: D1521.1.patch, D1521.1.patch
>
>
> The current implementation of HDFS stores the data in one block file and the 
> metadata(checksum) in another block file. This means that every read into the 
> HBase block cache actually consumes two disk iops, one to the datafile and 
> one to the checksum file. This is a major problem for scaling HBase, because 
> HBase is usually bottlenecked on the number of random disk iops that the 
> storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to