Xiaolin Ha created HBASE-25322:
----------------------------------
Summary: Redundant Reference file in bottom region of split
Key: HBASE-25322
URL: https://issues.apache.org/jira/browse/HBASE-25322
Project: HBase
Issue Type: Improvement
Affects Versions: 3.0.0-alpha-1
Reporter: Xiaolin Ha
Assignee: Xiaolin Ha
When we split a region ranges from (,), the bottom region should contain keys
of(,split key), and the top region should contain keys of [split key, ).
Currently, if we do the following operations:
# put rowkeys 100,101,102,103,104,105 to a table, and flush the memstore to
make a hfile with rowkyes 100,101,102,103,104,105;
# put rowkeys 200,201,202,203,204,205 to the table, and flush the memstore to
make a hfile with rowkyes 200,201,202,203,204,205;
# split the table region, using split key 200;
# then the bottom region will has two Reference files, while the top region
only has one.
But we expect the bottom region has only one Reference file as the the top
region.
That's because when generating Reference files in child region, the bottom
region used the `PrivateCellUtil.createLastOnRow(splitRow)` cell to compare to
first keys in the hfiles, while the top region used
`PrivateCellUtil.createFirstOnRow(splitRow)` cell to compare to last keys in
the hfiles.
`LastOnRow(splitRow)` means the maximum row generated by the split row, while
`FirstOnRow(splitRow)` means the minimus row generated by the split row. The
split row should be in the top region. And we should use `FirstOnRow(splitRow)`
compare to hfile first and last keys in both bottom and top region.
Though the redundant Reference file will not be read by the bottom region, the
compaction of the redundant Reference file will result in empty file if only
this redundant Reference file participates in a compaction.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)