[
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhihong Yu updated HBASE-5074:
------------------------------
Comment: was deleted
(was: mbautin has commented on the revision "[jira] [HBASE-5074] Support
checksums in HBase block cache".
Added CCs: dhruba
@Dhruba: The "checksum at the end of block" approach seems reasonable and the
implementation looks good! Specific comments inline.
INLINE COMMENTS
src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:357
What is the purpose of the hfs parameter here?
src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:49
s/preceeding/preceding/
src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:50
s/deermines/determines/
src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:51
s/does not need/do not need/
src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:119
s/major/minor/
src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:260
Rename the existing expectVersion to expectMajorVersion for clarity.
src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:343
Rename to expectMajorVersion for clarity.
Also, does the version field of this class now only contain the major
version? If so, rename it to majorVersion.
src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:345 Add
the word "major" to the error message.
src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:462
Rename to getMajorVersion
src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:402 Can we modify
the parameter type and get rid of this cast?
src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:415 This is not a
constructor, but a factory method.
src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:417 Add "ForTest"
to method name for clarity.
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:91 s/has/have/
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:95 Consider
replacing the "_V0" suffix with something more meaningful like "_NO_CHECKSUM".
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:102 Consider
using a suffix "_WITH_CHECKSUM".
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:123 When the
number of bytes per checksum becomes configurable, will that require a
persistent data format change? What will the upgrade procedure be in that case?
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:138 It is not
clear from this call that 0 is minor version. Create a constant with a
meaningful name (e.g. MINOR_VERSION_NO_CHECKSUM).
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:149 Consider
adding "WithChecksum" to variable name.
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:395-398 This
is becoming bulky. Factor out the common term (uncompressedSizeWithoutHeader +
headerSize() + cksumBytes) into a local variable. Also avoid evaluating
headerSize() twice.
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:400-402 Reuse
the new local variables from the above comments here.
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:441-442 Update
this comment, since the meaning of "extraBytes" has changed from just being the
room for the next block's header to a much more complex role.
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:757-758 Should
we throw an IOException instead since this method already throws it?
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:771-772 tmp is
a particularly bad variable name. Combine these two lines and get rid of tmp.
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:808-809 Get
rid of tmp and combine these two lines.
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:788 This
method and the above one seem to share a lot of code. Is it possible to get rid
of code duplication?
Also, these two methods seem isolated enough to be moved to another class,
maybe even as static methods.
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:866-870 Do we
need this in case of minorVersion = 0? Or do we always write new files with
checksums?
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:973-975
Somehow the fact that checksum format is different for compressed and
uncompressed blocks has escaped me halfway through the review. Maybe it is
worth explicitly mentioning that in javadoc.
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:999-1011 Use
System.arraycopy instead of loops.
Add "ForTest" to method name to discourage its use in production.
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1202 Nice!
Thanks for locking down these internal base classes and methods.
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1389-1390
Delete one of these lines.
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1520-1527 Does
it make sense to move this checksum instantiation code to a function and reuse
it everywhere we call ChecksumFactory.newInstance()?
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1786 Remove
this and other debug output statements.
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1871 As I
mentioned, it is probably better to move checksum computation and validation
code to a separate utility class.
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java:220 Use a
constant to indicate that this is a minor version without checksum support
instead of just 0.
src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:56 Is this
necessary? Does not Java call default constructors automatically?
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:70 This is
for testing only, right?
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:225 Long
line.
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:1
A lot in this file appears to be copy-paste from TestHFileBlock, so it very
difficult to see the real changes. Please reuse the appropriate code instead.
REVISION DETAIL
https://reviews.facebook.net/D1521
)
> support checksums in HBase block cache
> --------------------------------------
>
> Key: HBASE-5074
> URL: https://issues.apache.org/jira/browse/HBASE-5074
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: dhruba borthakur
> Assignee: dhruba borthakur
> Attachments: D1521.1.patch, D1521.1.patch
>
>
> The current implementation of HDFS stores the data in one block file and the
> metadata(checksum) in another block file. This means that every read into the
> HBase block cache actually consumes two disk iops, one to the datafile and
> one to the checksum file. This is a major problem for scaling HBase, because
> HBase is usually bottlenecked on the number of random disk iops that the
> storage-hardware offers.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira