[
https://issues.apache.org/jira/browse/HBASE-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184538#comment-14184538
]
stack commented on HBASE-12313:
-------------------------------
bq. Do you want to change really Stack?
This patch cleans up the CellUtil methods that do size counting. There were a
few too many methods each only slightly different from each other. In this
particular case, we are just doing an estimate and serialized size is probably
closest to what we are putting on wire at this stage. I don't see a problem
that it is slightly different from what was there before (what was there before
was an 'estimate'). Do you?
bq. Replacing estimatedLengthOf with estimatedSerializedSizeOf is correct?
Where we were using estimatedLengthOf (What is this anyways -- smile?
Serialized 'length' or size on heap? Or size of the serialized KeyValue byte
array -- which is going away), we were talking serialized size. I was thinking
estimatedSerializedSizeOf more appropriate where I did the replaces.
bq. No need to add the extra 4 bytes for heapSize which will come in
estimatedSerializedSizeOf
Are your referring to the TODO? I'd think that serialized size and heap size
will be calculated differently when we get around to it.
bq. Can we add a separator in between rk, f and q parts?
Whoops. Will fix.
bq. What if we do seekTo 'h' only ?
There is no 'h' in the dataset. It was 'artificial' midpoint. If you seek to
'h', you end up in the second block which starts with 'i'.
bq. Will this change in mid point calc make any issue in reads?
I don't believe so. This whole area was without tests previously. I made the
mid calc code stand apart and added a bunch in this patch. I also as part of
making this patch put in place the old code and the new and when the result did
not equate, I threw exception as our unit test suite ran. I looked at each
case to see if the difference was legit? What I found was that the differences
were because we made midkeys even when no advantage (as in the above 'h' case
-- no need to make a midkey if all sizes are the same).
> Redo the hfile index length optimization so cell-based rather than serialized
> KV key
> ------------------------------------------------------------------------------------
>
> Key: HBASE-12313
> URL: https://issues.apache.org/jira/browse/HBASE-12313
> Project: HBase
> Issue Type: Sub-task
> Components: regionserver, Scanners
> Reporter: stack
> Assignee: stack
> Attachments:
> 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch,
> 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch,
> 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch,
> 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch,
> 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch, 12313v5.txt
>
>
> Trying to remove API that returns the 'key' of a KV serialized into a byte
> array is thorny.
> I tried to move over the first and last key serializations and the hfile
> index entries to be cell but patch was turning massive. Here is a smaller
> patch that just redoes the optimization that tries to find 'short' midpoints
> between last key of last block and first key of next block so it is
> Cell-based rather than byte array based (presuming Keys serialized in a
> certain way). Adds unit tests which we didn't have before.
> Also remove CellKey. Not needed... at least not yet. Its just utility for
> toString.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)