[
https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485756#comment-13485756
]
Ted Yu commented on HBASE-6597:
-------------------------------
Update on my recent fidings.
I came up with patch for 0.94 branch.
Most data block encoding related tests pass.
TestHFileBlockCompatibility poses a little challenge. There is no embedded
checksum feature in 0.89-fb branch. So this test is unique to 0.94 / trunk.
In the test, there is a copy of Writer class which I assume shouldn't be
modified, at least not for a point release.
The test reuses some code from TestHFileBlock.java where there is some change
related to usage of Writer:
{code}
- static int writeTestKeyValues(OutputStream dos, int seed, boolean
includesMemstoreTS)
+ static void writeTestKeyValues(OutputStream dos, Writer hbw, int seed,
boolean includesMemstoreTS)
{code}
This is the test failure I am observing now:
{code}
testDataBlockEncoding[0](org.apache.hadoop.hbase.io.hfile.TestHFileBlockCompatibility)
Time elapsed: 0.129 sec <<< FAILURE!
org.junit.ComparisonFailure: Content mismath for compression NONE, encoding
PREFIX, pread false, commonPrefix 2, expected length 1859, actual length 1859
expected:<\x00\x00\x0[B\xB8]*\x0A\x00\x00\x0A\x0...> but
was:<\x00\x00\x0[0\x00]*\x0A\x00\x00\x0A\x0...>
at org.junit.Assert.assertEquals(Assert.java:125)
at
org.apache.hadoop.hbase.io.hfile.TestHFileBlock.assertBuffersEqual(TestHFileBlock.java:463)
at
org.apache.hadoop.hbase.io.hfile.TestHFileBlockCompatibility.testDataBlockEncoding(TestHFileBlockCompatibility.java:337)
{code}
> Block Encoding Size Estimation
> ------------------------------
>
> Key: HBASE-6597
> URL: https://issues.apache.org/jira/browse/HBASE-6597
> Project: HBase
> Issue Type: Improvement
> Components: io
> Affects Versions: 0.89-fb
> Reporter: Brian Nixon
> Assignee: Mikhail Bautin
> Priority: Minor
> Attachments: 6597-trunk.txt, D5895.1.patch, D5895.2.patch,
> D5895.3.patch, D5895.4.patch, D5895.5.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of
> the unencoded data. However, blocks in memory are kept encoded. By using an
> estimate for the encoded size of the block, we can get greater consistency in
> size.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira