[
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Phabricator updated HBASE-5074:
-------------------------------
Attachment: D1521.11.patch
dhruba updated the revision "[jira] [HBASE-5074] Support checksums in HBase
block cache".
Reviewers: mbautin
1. I modified the ChecksumType code to not dum an exception stack trace to
the output if CRC32C is not
available. Ted's suggestion of pulling CRC32C into hbase code sounds
reasonable, but I would like
to do it as part of another jira. Also, if hbase moves to hadoop 2.0, then
it will automatically
get CRC32C.
2. I added a "minorVersion=" to the output of HFilePrettyPrinter.
Stack, will you be able to run "bin/hbase hfile -m -f filename on your
cluster to verify that this
checksum feature is switched on. If it prints minorVersion=1, then you are
using this feature.
Do you still need a print somewhere saying that this feature in on? The
older files that were
pre-created before that patch was deployed will still use hdfs-checksum
verification, so you
could possible see hdfs-checksum-verification on stack traces on a live
regionserver.
3. I did some thinking (again) on the semantics of major version and minor
version. The major version
represents a new file format, e.g. suppose we add a new thing to the
file's triailer, then we
might need to bump up the major version. The minor version indicates the
format of data inside a
HFileBlock.
In the current code, major versions 1 and 2 share the same HFileFormat
(indicated by minor version
of 0). In this patch, we have a new minorVersion 1 because the data
contents inside a HFileBlock
has changed. Tecnically, both major version 1 and 2 could have either
minorVerion 0 or 1.
Now, suppose we want to add a new field to the trailer of the HFile. We
can bump the majorVersion
to 3 but do not change the minorVersion because we did not change the
internal format of an
HFileBlock.
Given the above, does it make sense to say that HFileBlock is independent
of the majorVersion?
REVISION DETAIL
https://reviews.facebook.net/D1521
AFFECTED FILES
src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
src/main/java/org/apache/hadoop/hbase/HConstants.java
src/main/java/org/apache/hadoop/hbase/fs
src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java
src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
> support checksums in HBase block cache
> --------------------------------------
>
> Key: HBASE-5074
> URL: https://issues.apache.org/jira/browse/HBASE-5074
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: dhruba borthakur
> Assignee: dhruba borthakur
> Attachments: D1521.1.patch, D1521.1.patch, D1521.10.patch,
> D1521.10.patch, D1521.10.patch, D1521.10.patch, D1521.10.patch,
> D1521.11.patch, D1521.11.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch,
> D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch,
> D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch,
> D1521.8.patch, D1521.9.patch, D1521.9.patch
>
>
> The current implementation of HDFS stores the data in one block file and the
> metadata(checksum) in another block file. This means that every read into the
> HBase block cache actually consumes two disk iops, one to the datafile and
> one to the checksum file. This is a major problem for scaling HBase, because
> HBase is usually bottlenecked on the number of random disk iops that the
> storage-hardware offers.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira