We need't pre-split DRS.
for example:
    roll DRSsize after it's size > 3 (row)
    row5 means row with primary key 5

1. init
     MRS: {}
     DRS0(min, max): {}
2. insert three rows
     MRS: {row5, row7, row8}
     DRS0(min, max): {}

2. flush
     MRS: {}
     DRS0(min, max): {row5, row6, row8}

3. insert
    MRS: {row1, row9, row10, row11}
    DRS0(min, max): {row5, row6, row8}

4. flush  && split
    MRS: {}
    DRS0(min, max): {row1,row5, row6, row8, row9, row10, row11}  ->
        DRS0(min, 6]:{row1, row5, row6}  DSR1(6, 10]:{row8, row9, row10} 
DSR2(11, max):{row11}


negative side effects:
1.fragment of DiskRowSets
2. redo log split is complicated


于 2016年03月17日 11:42, Binglin Chang 写道:

How can this be "bootstrapped"?
At beginning, there is no DRS, only one MRS.
It's hard to do pre-split DRS, if you don't know distribution, and key
distribution may change along time.

On Thu, Mar 17, 2016 at 11:36 AM, 曾 杰南 
<[email protected]><mailto:[email protected]> wrote:




Hi all:
I learn Kudu's paper "Kudu: Storage for Fast Analytics on Fast Data" very
hard to find
why performance of hbase' random query is superior to kudu. "the primary
key intervals
 of different RowSets may intersect" may be one of the reasons.

My confusion is why not keep DiskRowSets ordered on primary key globally.
When flush MemRowSet,
the rows of MemRowSet dispatch to deltaMemStore of correspanding
DiskRowSets. And negative side
effects is fragment of DiskRowSets, but it is worth for globally orderd of
DiskRowSets.

best
jie







Reply via email to