[
https://issues.apache.org/jira/browse/HBASE-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13531247#comment-13531247
]
Aleksandr Shulman commented on HBASE-7342:
------------------------------------------
Hi Ramkrishna,
The logic for the change is as follows:
With the existing implementation (using -1), when there are two items in the
array, it returns the 0th item ( (2 - 1) / 2 = 0 ) , which is equal the index
of the firstKey. This is a problem during splits because a split is invalid if
the midkey is equal to the firstKey. What we really want here is the index to
be 1. This is because the lastKey is going to be first key in the next block.
So there won't be a collision with it and the midkey will really represent the
mid of first and last.
> Split operation without split key incorrectly finds the middle key in
> off-by-one error
> --------------------------------------------------------------------------------------
>
> Key: HBASE-7342
> URL: https://issues.apache.org/jira/browse/HBASE-7342
> Project: HBase
> Issue Type: Bug
> Components: HFile, io
> Affects Versions: 0.94.1, 0.94.2, 0.94.3, 0.96.0
> Reporter: Aleksandr Shulman
> Assignee: Aleksandr Shulman
> Priority: Minor
> Fix For: 0.96.0, 0.94.4
>
> Attachments: HBASE-7342-v1.patch, HBASE-7342-v2.patch
>
>
> I took a deeper look into issues I was having using region splitting when
> specifying a region (but not a key for splitting).
> The midkey calculation is off by one and when there are 2 rows, will pick the
> 0th one. This causes the firstkey to be the same as midkey and the split will
> fail. Removing the -1 causes it work correctly, as per the test I've added.
> Looking into the code here is what goes on:
> 1. Split takes the largest storefile
> 2. It puts all the keys into a 2-dimensional array called blockKeys[][]. Key
> i resides as blockKeys[i]
> 3. Getting the middle root-level index should yield the key in the middle of
> the storefile
> 4. In step 3, we see that there is a possible erroneous (-1) to adjust for
> the 0-offset indexing.
> 5. In a result with where there are only 2 blockKeys, this yields the 0th
> block key.
> 6. Unfortunately, this is the same block key that 'firstKey' will be.
> 7. This yields the result in HStore.java:1873 ("cannot split because midkey
> is the same as first or last row")
> 8. Removing the -1 solves the problem (in this case).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira