Eungsop Yoo created HBASE-29039: ----------------------------------- Summary: Optimize read performance for accumulated delete markers on the same row or cell Key: HBASE-29039 URL: https://issues.apache.org/jira/browse/HBASE-29039 Project: HBase Issue Type: Improvement Affects Versions: 2.5.10, 2.6.1 Reporter: Eungsop Yoo
As delete markers are accumulated on the same row or cell, Get opertions slow down. It can be reproduced by follow HBase shell commands. {code} create 'test', 'c' java_import org.apache.hadoop.hbase.client.Delete java_import org.apache.hadoop.hbase.TableName java_import java.lang.System con = @hbase.instance_variable_get(:@connection) table = con.getTable(TableName.valueOf('test')) 1000.times do |i| # batch 10000 deletes with different timestamps every 10 seconds now = System.currentTimeMillis() dels = 10000.times.map do |i| del = Delete.new(Bytes.toBytes('row')) del.addFamily(Bytes.toBytes('c'), now + i) end table.delete(dels) sleep(10) puts "i - #{i}" get 'test', 'row' end {code} {code} i - 0 COLUMN CELL 0 row(s) Took 0.0251 seconds ... i - 10 COLUMN CELL 0 row(s) Took 0.0412 seconds ... i - 20 COLUMN CELL 0 row(s) Took 0.0760 seconds ... i - 30 COLUMN CELL 0 row(s) Took 0.1014 seconds ... i - 40 COLUMN CELL 0 row(s) Took 0.1616 seconds ... {code} But the performance of Get operations can be optimized by using SEEK_NEXT_COL. {code} i - 1 COLUMN CELL 0 row(s) Took 0.0087 seconds ... i - 11 COLUMN CELL 0 row(s) Took 0.0077 seconds ... i - 21 COLUMN CELL 0 row(s) Took 0.0087 seconds ... {code} Please review the PR. -- This message was sent by Atlassian Jira (v8.20.10#820010)