[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5074:
------------------------------

    Comment: was deleted

(was: mbautin has commented on the revision "[jira] [HBASE-5074] Support 
checksums in HBase block cache".
Added CCs: dhruba

  @Dhruba: The "checksum at the end of block" approach seems reasonable and the 
implementation looks good! Specific comments inline.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:357 
What is the purpose of the hfs parameter here?
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:49 
s/preceeding/preceding/
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:50 
s/deermines/determines/
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:51 
s/does not need/do not need/
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:119 
s/major/minor/
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:260 
Rename the existing expectVersion to expectMajorVersion for clarity.
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:343 
Rename to expectMajorVersion for clarity.

  Also, does the version field of this class now only contain the major 
version? If so, rename it to majorVersion.
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:345 Add 
the word "major" to the error message.
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:462 
Rename to getMajorVersion
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:402 Can we modify 
the parameter type and get rid of this cast?
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:415 This is not a 
constructor, but a factory method.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:417 Add "ForTest" 
to method name for clarity.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:91 s/has/have/
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:95 Consider 
replacing the "_V0" suffix with something more meaningful like "_NO_CHECKSUM".
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:102 Consider 
using a suffix "_WITH_CHECKSUM".
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:123 When the 
number of bytes per checksum becomes configurable, will that require a 
persistent data format change? What will the upgrade procedure be in that case?
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:138 It is not 
clear from this call that 0 is minor version. Create a constant with a 
meaningful name (e.g. MINOR_VERSION_NO_CHECKSUM).
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:149 Consider 
adding "WithChecksum" to variable name.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:395-398 This 
is becoming bulky. Factor out the common term (uncompressedSizeWithoutHeader + 
headerSize() + cksumBytes) into a local variable. Also avoid evaluating 
headerSize() twice.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:400-402 Reuse 
the new local variables from the above comments here.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:441-442 Update 
this comment, since the meaning of "extraBytes" has changed from just being the 
room for the next block's header to a much more complex role.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:757-758 Should 
we throw an IOException instead since this method already throws it?
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:771-772 tmp is 
a particularly bad variable name. Combine these two lines and get rid of tmp.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:808-809 Get 
rid of tmp and combine these two lines.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:788 This 
method and the above one seem to share a lot of code. Is it possible to get rid 
of code duplication?

  Also, these two methods seem isolated enough to be moved to another class, 
maybe even as static methods.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:866-870 Do we 
need this in case of minorVersion = 0? Or do we always write new files with 
checksums?
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:973-975 
Somehow the fact that checksum format is different for compressed and 
uncompressed blocks has escaped me halfway through the review. Maybe it is 
worth explicitly mentioning that in javadoc.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:999-1011 Use 
System.arraycopy instead of loops.

  Add "ForTest" to method name to discourage its use in production.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1202 Nice! 
Thanks for locking down these internal base classes and methods.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1389-1390 
Delete one of these lines.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1520-1527 Does 
it make sense to move this checksum instantiation code to a function and reuse 
it everywhere we call ChecksumFactory.newInstance()?
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1786 Remove 
this and other debug output statements.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1871 As I 
mentioned, it is probably better to move checksum computation and validation 
code to a separate utility class.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java:220 Use a 
constant to indicate that this is a minor version without checksum support 
instead of just 0.
  src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:56 Is this 
necessary? Does not Java call default constructors automatically?
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:70 This is 
for testing only, right?
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:225 Long 
line.
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:1
 A lot in this file appears to be copy-paste from TestHFileBlock, so it very 
difficult to see the real changes. Please reuse the appropriate code instead.

REVISION DETAIL
  https://reviews.facebook.net/D1521
)
    
> support checksums in HBase block cache
> --------------------------------------
>
>                 Key: HBASE-5074
>                 URL: https://issues.apache.org/jira/browse/HBASE-5074
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: D1521.1.patch, D1521.1.patch
>
>
> The current implementation of HDFS stores the data in one block file and the 
> metadata(checksum) in another block file. This means that every read into the 
> HBase block cache actually consumes two disk iops, one to the datafile and 
> one to the checksum file. This is a major problem for scaling HBase, because 
> HBase is usually bottlenecked on the number of random disk iops that the 
> storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to