Eungsop Yoo created HBASE-29039:
-----------------------------------

             Summary: Optimize read performance for accumulated delete markers 
on the same row or cell
                 Key: HBASE-29039
                 URL: https://issues.apache.org/jira/browse/HBASE-29039
             Project: HBase
          Issue Type: Improvement
    Affects Versions: 2.5.10, 2.6.1
            Reporter: Eungsop Yoo


As delete markers are accumulated on the same row or cell, Get opertions slow 
down. It can be reproduced by follow HBase shell commands.

{code}
create 'test', 'c'

java_import org.apache.hadoop.hbase.client.Delete
java_import org.apache.hadoop.hbase.TableName
java_import java.lang.System

con = @hbase.instance_variable_get(:@connection)
table = con.getTable(TableName.valueOf('test'))

1000.times do |i|
  # batch 10000 deletes with different timestamps every 10 seconds
  now = System.currentTimeMillis()
  dels = 10000.times.map do |i|
    del = Delete.new(Bytes.toBytes('row'))
    del.addFamily(Bytes.toBytes('c'), now + i)
  end
  table.delete(dels)
  sleep(10)
  puts "i - #{i}"
  get 'test', 'row'
end
{code}
{code}
i - 0
COLUMN                                                                          
            CELL
0 row(s)
Took 0.0251 seconds
...
i - 10
COLUMN                                                                          
            CELL
0 row(s)
Took 0.0412 seconds
...
i - 20
COLUMN                                                                          
            CELL
0 row(s)
Took 0.0760 seconds
...
i - 30
COLUMN                                                                          
            CELL
0 row(s)
Took 0.1014 seconds
...
i - 40
COLUMN                                                                          
            CELL
0 row(s)
Took 0.1616 seconds
...
{code}

But the performance of Get operations can be optimized by using SEEK_NEXT_COL.
{code}
i - 1
COLUMN                                                                          
            CELL
0 row(s)
Took 0.0087 seconds
...
i - 11
COLUMN                                                                          
            CELL
0 row(s)
Took 0.0077 seconds
...
i - 21
COLUMN                                                                          
            CELL
0 row(s)
Took 0.0087 seconds
...
{code}

Please review the PR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to