[ 
https://issues.apache.org/jira/browse/HBASE-25322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaolin Ha updated HBASE-25322:
-------------------------------
    Description: 
When we split a region ranges from (,), the bottom region should contain keys 
of(,split key), and the top region should contain keys of [split key, ).

Currently, if we do the following operations:
 # put rowkeys 100,101,102,103,104,105 to a table, and flush the memstore to 
make a hfile with rowkeys 100,101,102,103,104,105;
 # put rowkeys 200,201,202,203,204,205 to the table, and flush the memstore to 
make a hfile with rowkeys 200,201,202,203,204,205;
 # split the table region, using split key 200;
 # then the bottom region will has two Reference files, while the top region 
only has one.

But we expect the bottom region has only one Reference file as the the top 
region.

That's because when generating Reference files in child region,  the bottom 
region used the `PrivateCellUtil.createLastOnRow(splitRow)` cell to compare to 
first keys in the hfiles, while the top region used 
`PrivateCellUtil.createFirstOnRow(splitRow)` cell to compare to last keys in 
the hfiles.

`LastOnRow(splitRow)` means the maximum row generated by the split row, while 
`FirstOnRow(splitRow)` means the minimus row generated by the split row. The 
split row should be in the top region. And we should use `FirstOnRow(splitRow)` 
compare to hfile first and last keys in both bottom and top region. 

Though the redundant Reference file will not be read by the bottom region, the 
compaction of the redundant Reference file will result in empty file if only 
this redundant Reference file participates in a compaction.

 

 

 

 

  was:
When we split a region ranges from (,), the bottom region should contain keys 
of(,split key), and the top region should contain keys of [split key, ).

Currently, if we do the following operations:
 # put rowkeys 100,101,102,103,104,105 to a table, and flush the memstore to 
make a hfile with rowkyes 100,101,102,103,104,105;
 # put rowkeys 200,201,202,203,204,205 to the table, and flush the memstore to 
make a hfile with rowkyes 200,201,202,203,204,205;
 # split the table region, using split key 200;
 # then the bottom region will has two Reference files, while the top region 
only has one.

But we expect the bottom region has only one Reference file as the the top 
region.

That's because when generating Reference files in child region,  the bottom 
region used the `PrivateCellUtil.createLastOnRow(splitRow)` cell to compare to 
first keys in the hfiles, while the top region used 
`PrivateCellUtil.createFirstOnRow(splitRow)` cell to compare to last keys in 
the hfiles.

`LastOnRow(splitRow)` means the maximum row generated by the split row, while 
`FirstOnRow(splitRow)` means the minimus row generated by the split row. The 
split row should be in the top region. And we should use `FirstOnRow(splitRow)` 
compare to hfile first and last keys in both bottom and top region. 

Though the redundant Reference file will not be read by the bottom region, the 
compaction of the redundant Reference file will result in empty file if only 
this redundant Reference file participates in a compaction.

 

 

 

 


> Redundant Reference file in bottom region of split
> --------------------------------------------------
>
>                 Key: HBASE-25322
>                 URL: https://issues.apache.org/jira/browse/HBASE-25322
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 3.0.0-alpha-1
>            Reporter: Xiaolin Ha
>            Assignee: Xiaolin Ha
>            Priority: Major
>
> When we split a region ranges from (,), the bottom region should contain keys 
> of(,split key), and the top region should contain keys of [split key, ).
> Currently, if we do the following operations:
>  # put rowkeys 100,101,102,103,104,105 to a table, and flush the memstore to 
> make a hfile with rowkeys 100,101,102,103,104,105;
>  # put rowkeys 200,201,202,203,204,205 to the table, and flush the memstore 
> to make a hfile with rowkeys 200,201,202,203,204,205;
>  # split the table region, using split key 200;
>  # then the bottom region will has two Reference files, while the top region 
> only has one.
> But we expect the bottom region has only one Reference file as the the top 
> region.
> That's because when generating Reference files in child region,  the bottom 
> region used the `PrivateCellUtil.createLastOnRow(splitRow)` cell to compare 
> to first keys in the hfiles, while the top region used 
> `PrivateCellUtil.createFirstOnRow(splitRow)` cell to compare to last keys in 
> the hfiles.
> `LastOnRow(splitRow)` means the maximum row generated by the split row, while 
> `FirstOnRow(splitRow)` means the minimus row generated by the split row. The 
> split row should be in the top region. And we should use 
> `FirstOnRow(splitRow)` compare to hfile first and last keys in both bottom 
> and top region. 
> Though the redundant Reference file will not be read by the bottom region, 
> the compaction of the redundant Reference file will result in empty file if 
> only this redundant Reference file participates in a compaction.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to