[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

Phabricator (Commented) (JIRA) Thu, 02 Feb 2012 10:43:19 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199072#comment-13199072
 ]


Phabricator commented on HBASE-5074:
------------------------------------

dhruba has commented on the revision "[jira] [HBASE-5074] Support checksums in 
HBase block cache".

INLINE COMMENTS
  
src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java:206-207 
will fix
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:1072 sure
  src/main/java/org/apache/hadoop/hbase/HConstants.java:598 I will make this 
part of the code cleaner. I still am hoping to keep only one knob: whether to 
verify hbase checksums or not. If hbase checksums is switched on, then hdfs 
checksums will automatically be switched off. If hbase checksums is configured 
'off', then it will automatically switch on hdfs checksums. I feel that the 
other knobs (e.g. no checksums at all or use both checksums) are not very 
interesting in *any* production environment and I would like to keep the code 
complexity a little lower by avoiding those two combinations. Hope that is ok 
with you.
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3597 Good 
idea, will do
  
src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java:31
 It tried this, but it needs a few changes, so I anyway landed up with needing 
my own object wrapper over DataOutputBuffer.
  src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:39 I too feel 
that we should add the checksum type to the hfileblock header. That will make 
us future proof to try new checksum algorithms in the future. Will make this 
change.
  src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:132-133 This is 
equivalent to the existing FileSystem.get() and many places in hbase uses this.
  src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:80 I will make 
this public so that users can create a HFileSystem object on a non-default path
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:102 I am 
making changes here based on mikhial's suggestion too.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:229 as you 
would see, the existing code path that create a HFileBlock usin g this 
constructor uses it for only in-memory caching, so it never fills up or uses 
the onDiskDataSizeWithHeader field. But I will set it to what you propose.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:252 
ondisksizewithheader = ondiskdatasizewithheader + checksum bytes
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:751 I am in 
complete agreement with you. I wish I could have used the hadoop trunk code 
here. Unfortunately, hbase pulls in hadoop 1.0 which does not have this 
implementation. Another option is to make a copy of this code from hadoop into 
hbase code, but this has its own set of problems for maintainability. I am 
hoping that hbase will move to hadoop 2.0 very soon and then we can start the 
more optimal checksum implementation. Hope that is ok with you.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1401-1402 This 
needs to be thread safe.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1634 This is 
an internal method and this error is handled by upper layers (by switching off 
hbase checksums). So, I am following the paradigm of using Exceptions only when 
true errors happen; I would like to avoid writing code that generates 
exceptions  in one layer catches them in another layer and handles them. The 
discussion with Doug Cutting on the hdfs-symlink patch is etched in my mind:-)
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1888 I will 
work (in a later patch) to use bulk checksum verifications, using native code, 
etc (from hadoop) in a later patch. I would like to keep this patch smaller 
that what it already is by focussing on the disk format change, compatibility 
with older versions, etc. The main reason is that most of the hadoop checksum 
optimizations are only in hadoop 2.0. I am hoping that it is ok with you.

REVISION DETAIL
  https://reviews.facebook.net/D1521

                
> support checksums in HBase block cache
> --------------------------------------
>
>                 Key: HBASE-5074
>                 URL: https://issues.apache.org/jira/browse/HBASE-5074
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: D1521.1.patch, D1521.1.patch
>
>
> The current implementation of HDFS stores the data in one block file and the 
> metadata(checksum) in another block file. This means that every read into the 
> HBase block cache actually consumes two disk iops, one to the datafile and 
> one to the checksum file. This is a major problem for scaling HBase, because 
> HBase is usually bottlenecked on the number of random disk iops that the 
> storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

Reply via email to