[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5074:
-------------------------------

    Attachment: D1521.11.patch

dhruba updated the revision "[jira] [HBASE-5074] Support checksums in HBase 
block cache".
Reviewers: mbautin

  1. I modified the ChecksumType code to not dum an exception stack trace to 
the output if CRC32C is not
     available. Ted's suggestion of pulling CRC32C into hbase code sounds 
reasonable, but I would like
     to do it as part of another jira. Also, if hbase moves to hadoop 2.0, then 
it will automatically
     get CRC32C.
  2. I added a "minorVersion=" to the output of HFilePrettyPrinter.
     Stack, will you be able to run "bin/hbase hfile -m -f filename on your 
cluster to verify that this
     checksum feature is switched on. If it prints minorVersion=1, then you are 
using this feature.
     Do you still need a print somewhere saying that this feature in on? The 
older files that were
     pre-created before that patch was deployed will still use hdfs-checksum 
verification, so you
     could possible see hdfs-checksum-verification on stack traces on a live 
regionserver.
  3. I did some thinking (again) on the semantics of major version and minor 
version. The major version
     represents a new file format, e.g. suppose we add a new thing to the 
file's triailer, then we
     might need to bump up the major version. The minor version indicates the 
format of data inside a
     HFileBlock.
     In the current code, major versions 1 and 2 share the same HFileFormat 
(indicated by minor version
     of 0). In this patch, we have a new minorVersion 1 because the data 
contents inside a HFileBlock
     has changed. Tecnically, both major version 1 and 2 could have either 
minorVerion 0 or 1.
     Now, suppose we want to add a new field to the trailer of the HFile. We 
can bump the majorVersion
     to 3 but do not change the minorVersion because we did not change the 
internal format of an
     HFileBlock.
     Given the above, does it make sense to say that HFileBlock is independent 
of the majorVersion?

REVISION DETAIL
  https://reviews.facebook.net/D1521

AFFECTED FILES
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
  src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
  src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
  src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
  src/main/java/org/apache/hadoop/hbase/HConstants.java
  src/main/java/org/apache/hadoop/hbase/fs
  src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
  src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
  src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
  src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
  
src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

                
> support checksums in HBase block cache
> --------------------------------------
>
>                 Key: HBASE-5074
>                 URL: https://issues.apache.org/jira/browse/HBASE-5074
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: D1521.1.patch, D1521.1.patch, D1521.10.patch, 
> D1521.10.patch, D1521.10.patch, D1521.10.patch, D1521.10.patch, 
> D1521.11.patch, D1521.11.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, 
> D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, 
> D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch, 
> D1521.8.patch, D1521.9.patch, D1521.9.patch
>
>
> The current implementation of HDFS stores the data in one block file and the 
> metadata(checksum) in another block file. This means that every read into the 
> HBase block cache actually consumes two disk iops, one to the datafile and 
> one to the checksum file. This is a major problem for scaling HBase, because 
> HBase is usually bottlenecked on the number of random disk iops that the 
> storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to