[ 
https://issues.apache.org/jira/browse/HBASE-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14187200#comment-14187200
 ] 

stack commented on HBASE-12313:
-------------------------------

bq. So what happens when a scan is created with start key as 'h' and end key as 
'z'. We will start from the 0th block thinking 'h' is in that block and later 
fetch the next block where it starts with 'i'. 

That is a correct. When we search the index, block '1' starts with 'i' so if a 
'h' exists, it must be in block '0'.

We do not consult the 'lastkey-in-a-block' when doing index lookup.  If we did, 
in the test, we'd notice that last key in block was actually 'f' and so 
therefore we should really be returning '1' instead of '0' -- but this is a 
TODO.

Let me just restore the midkey to the way it used to work.  My thinking was no 
need of a midkey when no savings to be had in index size but you raise the 
interesting point that even though no size savings, we could save a seek in the 
rare case where a key lookup falls between last key of one block and first of 
the next AND it happens to fall after the calculated midkey (for the case where 
key elements are all same size -- when not we were doing old behavior).  Turns 
out the midkey calc is as though we were consulting lastkey in block (only we 
aren't) only it works only 50% of the time (when key is > midkey).



> Redo the hfile index length optimization so cell-based rather than serialized 
> KV key
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-12313
>                 URL: https://issues.apache.org/jira/browse/HBASE-12313
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver, Scanners
>            Reporter: stack
>            Assignee: stack
>         Attachments: 
> 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch, 
> 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch, 
> 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch, 
> 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch, 
> 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch, 12313v5.txt, 
> 12313v6.txt, 12313v8.txt
>
>
> Trying to remove API that returns the 'key' of a KV serialized into a byte 
> array is thorny.
> I tried to move over the first and last key serializations and the hfile 
> index entries to be cell but patch was turning massive.  Here is a smaller 
> patch that just redoes the optimization that tries to find 'short' midpoints 
> between last key of last block and first key of next block so it is 
> Cell-based rather than byte array based (presuming Keys serialized in a 
> certain way).  Adds unit tests which we didn't have before.
> Also remove CellKey.  Not needed... at least not yet.  Its just utility for 
> toString.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to