[ 
https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485756#comment-13485756
 ] 

Ted Yu commented on HBASE-6597:
-------------------------------

Update on my recent fidings.
I came up with patch for 0.94 branch.
Most data block encoding related tests pass.
TestHFileBlockCompatibility poses a little challenge. There is no embedded 
checksum feature in 0.89-fb branch. So this test is unique to 0.94 / trunk.
In the test, there is a copy of Writer class which I assume shouldn't be 
modified, at least not for a point release.
The test reuses some code from TestHFileBlock.java where there is some change 
related to usage of Writer:
{code}
- static int writeTestKeyValues(OutputStream dos, int seed, boolean 
includesMemstoreTS)
+ static void writeTestKeyValues(OutputStream dos, Writer hbw, int seed, 
boolean includesMemstoreTS)
{code}
This is the test failure I am observing now:
{code}
testDataBlockEncoding[0](org.apache.hadoop.hbase.io.hfile.TestHFileBlockCompatibility)
  Time elapsed: 0.129 sec  <<< FAILURE!
org.junit.ComparisonFailure: Content mismath for compression NONE, encoding 
PREFIX, pread false, commonPrefix 2, expected length 1859, actual length 1859 
expected:<\x00\x00\x0[B\xB8]*\x0A\x00\x00\x0A\x0...> but 
was:<\x00\x00\x0[0\x00]*\x0A\x00\x00\x0A\x0...>
  at org.junit.Assert.assertEquals(Assert.java:125)
  at 
org.apache.hadoop.hbase.io.hfile.TestHFileBlock.assertBuffersEqual(TestHFileBlock.java:463)
  at 
org.apache.hadoop.hbase.io.hfile.TestHFileBlockCompatibility.testDataBlockEncoding(TestHFileBlockCompatibility.java:337)
{code}
                
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Assignee: Mikhail Bautin
>            Priority: Minor
>         Attachments: 6597-trunk.txt, D5895.1.patch, D5895.2.patch, 
> D5895.3.patch, D5895.4.patch, D5895.5.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of 
> the unencoded data. However, blocks in memory are kept encoded. By using an 
> estimate for the encoded size of the block, we can get greater consistency in 
> size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to