[ 
https://issues.apache.org/jira/browse/HBASE-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533274#comment-13533274
 ] 

Lars Hofhansl commented on HBASE-7342:
--------------------------------------

Just so I understand... Since the code picks the midkey as the row key of the 
first KeyValue of the mid block, it should never pick the first block, because 
that would by definition be the same as the first key of the file (unless there 
is only one, in which case it should not split anyway).

But it's fine to pick the last block, because the the first key in that block 
is likely not the last key of the file (unless now there's only a single 
KeyValue in that last block).

Did I understand this correctly?

                
> Split operation without split key incorrectly finds the middle key in 
> off-by-one error
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-7342
>                 URL: https://issues.apache.org/jira/browse/HBASE-7342
>             Project: HBase
>          Issue Type: Bug
>          Components: HFile, io
>    Affects Versions: 0.94.1, 0.94.2, 0.94.3, 0.96.0
>            Reporter: Aleksandr Shulman
>            Assignee: Aleksandr Shulman
>            Priority: Minor
>             Fix For: 0.96.0, 0.94.4
>
>         Attachments: HBASE-7342-v1.patch, HBASE-7342-v2.patch
>
>
> I took a deeper look into issues I was having using region splitting when 
> specifying a region (but not a key for splitting).
> The midkey calculation is off by one and when there are 2 rows, will pick the 
> 0th one. This causes the firstkey to be the same as midkey and the split will 
> fail. Removing the -1 causes it work correctly, as per the test I've added.
> Looking into the code here is what goes on:
> 1. Split takes the largest storefile
> 2. It puts all the keys into a 2-dimensional array called blockKeys[][]. Key 
> i resides as blockKeys[i]
> 3. Getting the middle root-level index should yield the key in the middle of 
> the storefile
> 4. In step 3, we see that there is a possible erroneous (-1) to adjust for 
> the 0-offset indexing.
> 5. In a result with where there are only 2 blockKeys, this yields the 0th 
> block key. 
> 6. Unfortunately, this is the same block key that 'firstKey' will be.
> 7. This yields the result in HStore.java:1873 ("cannot split because midkey 
> is the same as first or last row")
> 8. Removing the -1 solves the problem (in this case). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to