[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217094#comment-13217094
 ] 

Phabricator commented on HBASE-5074:
------------------------------------

dhruba has commented on the revision "[jira] [HBASE-5074] Support checksums in 
HBase block cache".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:451-452 I 
think it is better to not add another 4 bytes to the HFileBlock (increases 
heapSize), instead just compute it when needed, especially since this method is 
used only for debugging.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:529-530 shall 
we avoid increasing the HeapSize vs computing headerSize? It should be really 
cheap to compute headerSize(), especially since it is likely to be inlined.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1636 I think 
we should always print this. This follows the precedence in other parts of the 
HBase code. And this code path is the exception and not the norm
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1642-1644 I am 
pretty sure that it is better to construct this message only if there is a 
checksum mismatch.
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3610-3612 The 
secret is to pass in a HFileSystem to HRegion.newHRegion(). This HFileSystem is 
extracted from the RegionServerServices, if it is not-null. Otherwise, a 
default file system object is created and passed into HRegion.newHRegion
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:57-60 getName() 
is better because it allows annotating the name differently from what Java does 
vi toString (especially if we add new crc algorithms in the future)
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:143-144 I would 
like to keep getName() because it allows us to not change the API if  we decide 
to override java's toString convention, especially if we add new checksum 
algorithms in the future. (Similar to why there are two separate methods 
Enum.name and Enum.toString)
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:179 That's 
right. But the existence of this API allows us to do own own names in the 
future. (Also, when there are only two or three values, this might be better 
than looking into a map)
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:1
 I am not planning to change that, this code is what was there in HFileBlock, 
so it is good to carry it over in a unit test to be able to generate  files in 
the older format. This is used by unit tests alone.

  JUst replacing it with a pre-created file(s) is not very cool, especially 
because the pre-created file(s) will test only that file whereas if we keep 
this code here, we can write more and more unit tests in the future that can 
generate  different files in the older format and test backward compatibility.

REVISION DETAIL
  https://reviews.facebook.net/D1521

                
> support checksums in HBase block cache
> --------------------------------------
>
>                 Key: HBASE-5074
>                 URL: https://issues.apache.org/jira/browse/HBASE-5074
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: D1521.1.patch, D1521.1.patch, D1521.10.patch, 
> D1521.10.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, 
> D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, 
> D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch, 
> D1521.9.patch, D1521.9.patch
>
>
> The current implementation of HDFS stores the data in one block file and the 
> metadata(checksum) in another block file. This means that every read into the 
> HBase block cache actually consumes two disk iops, one to the datafile and 
> one to the checksum file. This is a major problem for scaling HBase, because 
> HBase is usually bottlenecked on the number of random disk iops that the 
> storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to