[jira] [Commented] (HBASE-12313) Redo the hfile index length optimization so cell-based rather than serialized KV key

stack (JIRA) Sun, 26 Oct 2014 09:55:38 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184538#comment-14184538
 ]


stack commented on HBASE-12313:
-------------------------------

bq. Do you want to change really Stack?

This patch cleans up the CellUtil methods that do size counting.  There were a 
few too many methods each only slightly different from each other.  In this 
particular case, we are just doing an estimate and serialized size is probably 
closest to what we are putting on wire at this stage.  I don't see a problem 
that it is slightly different from what was there before (what was there before 
was an 'estimate').  Do you?

bq. Replacing estimatedLengthOf with estimatedSerializedSizeOf is correct?

Where we were using estimatedLengthOf (What is this anyways -- smile? 
Serialized 'length' or size on heap?  Or size of the serialized KeyValue byte 
array -- which is going away), we were talking serialized size.  I was thinking 
estimatedSerializedSizeOf more appropriate where I did the replaces.

bq. No need to add the extra 4 bytes for heapSize which will come in 
estimatedSerializedSizeOf

Are your referring to the TODO? I'd think that serialized size and heap size 
will be calculated differently when we get around to it.

bq. Can we add a separator in between rk, f and q parts?

Whoops.  Will fix.

bq. What if we do seekTo 'h' only ?

There is no 'h' in the dataset.  It was 'artificial' midpoint.  If you seek to 
'h', you end up in the second block which starts with 'i'.

bq. Will this change in mid point calc make any issue in reads?

I don't believe so.  This whole area was without tests previously.  I made the 
mid calc code stand apart and added a bunch in this patch.  I also as part of 
making this patch put in place the old code and the new and when the result did 
not equate, I threw exception as our unit test suite ran.  I looked at each 
case to see if the difference was legit?  What I found was that the differences 
were because we made midkeys even when no advantage (as in the above 'h' case 
-- no need to make a midkey if all sizes are the same).



> Redo the hfile index length optimization so cell-based rather than serialized 
> KV key
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-12313
>                 URL: https://issues.apache.org/jira/browse/HBASE-12313
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver, Scanners
>            Reporter: stack
>            Assignee: stack
>         Attachments: 
> 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch, 
> 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch, 
> 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch, 
> 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch, 
> 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch, 12313v5.txt
>
>
> Trying to remove API that returns the 'key' of a KV serialized into a byte 
> array is thorny.
> I tried to move over the first and last key serializations and the hfile 
> index entries to be cell but patch was turning massive.  Here is a smaller 
> patch that just redoes the optimization that tries to find 'short' midpoints 
> between last key of last block and first key of next block so it is 
> Cell-based rather than byte array based (presuming Keys serialized in a 
> certain way).  Adds unit tests which we didn't have before.
> Also remove CellKey.  Not needed... at least not yet.  Its just utility for 
> toString.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12313) Redo the hfile index length optimization so cell-based rather than serialized KV key

Reply via email to