[
https://issues.apache.org/jira/browse/HBASE-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184438#comment-14184438
]
Anoop Sam John edited comment on HBASE-12313 at 10/26/14 10:16 AM:
-------------------------------------------------------------------
{code}
for (Cell cell : rr.rawCells()) {
- resultSize += CellUtil.estimatedLengthOf(cell);
+ resultSize += CellUtil.estimatedSerializedSizeOf(cell);
{code}
estimatedLengthOf was returning the total length. estimatedSerializedSizeOf()
is having extra count 4 bytes +. Do you want to change really Stack?
Replacing estimatedLengthOf with estimatedSerializedSizeOf is correct?
{code}
+ private static int getSumOfKeyElementLengths(final Cell cell) {
+ return cell.getRowLength() + cell.getFamilyLength() +
+ cell.getQualifierLength() +
+ cell.getValueLength() +
+ cell.getTagsLength() +
+ KeyValue.TIMESTAMP_TYPE_SIZE;
+ }
+
+ public static int estimatedSerializedSizeOfKey(final Cell cell) {
+ if (cell instanceof KeyValue) return ((KeyValue)cell).getKeyLength();
+ // This will be a low estimate. Will do for now.
+ return getSumOfKeyElementLengths(cell);
+ }
{code}
getSumOfKeyElementLengths - including lengths of tags and value?
{code}
- return cell.getRowLength() + cell.getFamilyLength() +
cell.getQualifierLength()
- + cell.getValueLength() + cell.getTagsLength() +
KeyValue.TIMESTAMP_TYPE_SIZE;
+ // TODO: Add sizing of references that hold the row, family, etc., arrays.
+ return estimatedSerializedSizeOf(cell);
{code}
No need to add the extra 4 bytes for heapSize which will come in
estimatedSerializedSizeOf (?)
{code}
+ public static String getCellKeyAsString(Cell cell) {
+ StringBuilder sb = new StringBuilder(Bytes.toStringBinary(
+ cell.getRowArray(), cell.getRowOffset(), cell.getRowLength()));
+ sb.append(cell.getFamilyLength() == 0? "":
+ Bytes.toStringBinary(cell.getFamilyArray(), cell.getFamilyOffset(),
cell.getFamilyLength()));
+ sb.append(cell.getQualifierLength() == 0? "":
+ Bytes.toStringBinary(cell.getQualifierArray(), cell.getQualifierOffset(),
+ cell.getQualifierLength()));
{code}
Can we add a separator in between rk, f and q parts?
{code}
- // h goes to the next block
- assertEquals(-2, scanner.seekTo(toKV("h", tagUsage)));
+ // 'h' does not exist so we will get a '1' back for not found.
+ assertEquals(0, scanner.seekTo(toKV("i", tagUsage)));
assertEquals("i", toRowStr(scanner.getKeyValue()));
{code}
What if we do seekTo 'h' only ?
{code}
- assertEquals(1, blockIndexReader.rootBlockContainingKey(
- toKV("h", tagUsage)));
+ // 'h', being midpoint between 'g' and 'i', used to be the block index key
because of the
+ // little optimization done creating block index keys where we try to get
a midpoint and then
+ // make this midpoint short as possible so index blocks are kept tight.
But now, we won't do
+ // the 'optimization' -- create new key -- if there is no gain to be had
by way of making
+ // a shorter key; in this case we just use the start key in the index.
This means the below
+ // test changes. Looking for 'h', it'll be in the 0 block rather than 1
block now (though 'h'
+ // does not exist in this file).
+ assertEquals(0, blockIndexReader.rootBlockContainingKey(toKV("h",
tagUsage)));
{code}
Read your comment to see why is the change. Will this change in mid point calc
make any issue in reads?
was (Author: anoop.hbase):
{code}
for (Cell cell : rr.rawCells()) {
- resultSize += CellUtil.estimatedLengthOf(cell);
+ resultSize += CellUtil.estimatedSerializedSizeOf(cell);
{code}
estimatedLengthOf was returning the total length. estimatedSerializedSizeOf()
is having extra count 4 bytes +. Do you want to change really Stack?
Replacing estimatedLengthOf with estimatedSerializedSizeOf is correct?
> Redo the hfile index length optimization so cell-based rather than serialized
> KV key
> ------------------------------------------------------------------------------------
>
> Key: HBASE-12313
> URL: https://issues.apache.org/jira/browse/HBASE-12313
> Project: HBase
> Issue Type: Sub-task
> Components: regionserver, Scanners
> Reporter: stack
> Assignee: stack
> Attachments:
> 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch,
> 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch,
> 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch,
> 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch,
> 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch, 12313v5.txt
>
>
> Trying to remove API that returns the 'key' of a KV serialized into a byte
> array is thorny.
> I tried to move over the first and last key serializations and the hfile
> index entries to be cell but patch was turning massive. Here is a smaller
> patch that just redoes the optimization that tries to find 'short' midpoints
> between last key of last block and first key of next block so it is
> Cell-based rather than byte array based (presuming Keys serialized in a
> certain way). Adds unit tests which we didn't have before.
> Also remove CellKey. Not needed... at least not yet. Its just utility for
> toString.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)