[
https://issues.apache.org/jira/browse/HBASE-29039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eungsop Yoo updated HBASE-29039:
--------------------------------
Description:
As delete markers are accumulated on the same row or cell, Get opertions slow
down. It can be reproduced by follow HBase shell commands.
{code}
create 'test', 'c'
java_import org.apache.hadoop.hbase.client.Delete
java_import org.apache.hadoop.hbase.TableName
java_import java.lang.System
con = @hbase.instance_variable_get(:@connection)
table = con.getTable(TableName.valueOf('test'))
1000.times do |i|
# batch 10000 deletes with different timestamps every 10 seconds
now = System.currentTimeMillis()
dels = 10000.times.map do |i|
del = Delete.new(Bytes.toBytes('row'))
del.addFamily(Bytes.toBytes('c'), now + i)
end
table.delete(dels)
sleep(10)
puts "i - #{i}"
get 'test', 'row'
end
{code}
{code}
i - 0
COLUMN
CELL
0 row(s)
Took 0.0251 seconds
...
i - 10
COLUMN
CELL
0 row(s)
Took 0.0412 seconds
...
i - 20
COLUMN
CELL
0 row(s)
Took 0.0760 seconds
...
i - 30
COLUMN
CELL
0 row(s)
Took 0.1014 seconds
...
i - 40
COLUMN
CELL
0 row(s)
Took 0.1616 seconds
...
{code}
But the performance of Get operations can be optimized by using SEEK_NEXT_COL.
{code}
i - 1
COLUMN
CELL
0 row(s)
Took 0.0087 seconds
...
i - 11
COLUMN
CELL
0 row(s)
Took 0.0077 seconds
...
i - 21
COLUMN
CELL
0 row(s)
Took 0.0087 seconds
...
{code}
Please review the PR.
https://github.com/apache/hbase/pull/6557
was:
As delete markers are accumulated on the same row or cell, Get opertions slow
down. It can be reproduced by follow HBase shell commands.
{code}
create 'test', 'c'
java_import org.apache.hadoop.hbase.client.Delete
java_import org.apache.hadoop.hbase.TableName
java_import java.lang.System
con = @hbase.instance_variable_get(:@connection)
table = con.getTable(TableName.valueOf('test'))
1000.times do |i|
# batch 10000 deletes with different timestamps every 10 seconds
now = System.currentTimeMillis()
dels = 10000.times.map do |i|
del = Delete.new(Bytes.toBytes('row'))
del.addFamily(Bytes.toBytes('c'), now + i)
end
table.delete(dels)
sleep(10)
puts "i - #{i}"
get 'test', 'row'
end
{code}
{code}
i - 0
COLUMN
CELL
0 row(s)
Took 0.0251 seconds
...
i - 10
COLUMN
CELL
0 row(s)
Took 0.0412 seconds
...
i - 20
COLUMN
CELL
0 row(s)
Took 0.0760 seconds
...
i - 30
COLUMN
CELL
0 row(s)
Took 0.1014 seconds
...
i - 40
COLUMN
CELL
0 row(s)
Took 0.1616 seconds
...
{code}
But the performance of Get operations can be optimized by using SEEK_NEXT_COL.
{code}
i - 1
COLUMN
CELL
0 row(s)
Took 0.0087 seconds
...
i - 11
COLUMN
CELL
0 row(s)
Took 0.0077 seconds
...
i - 21
COLUMN
CELL
0 row(s)
Took 0.0087 seconds
...
{code}
Please review the PR.
> Optimize read performance for accumulated delete markers on the same row or
> cell
> --------------------------------------------------------------------------------
>
> Key: HBASE-29039
> URL: https://issues.apache.org/jira/browse/HBASE-29039
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 2.6.1, 2.5.10
> Reporter: Eungsop Yoo
> Priority: Major
>
> As delete markers are accumulated on the same row or cell, Get opertions slow
> down. It can be reproduced by follow HBase shell commands.
> {code}
> create 'test', 'c'
> java_import org.apache.hadoop.hbase.client.Delete
> java_import org.apache.hadoop.hbase.TableName
> java_import java.lang.System
> con = @hbase.instance_variable_get(:@connection)
> table = con.getTable(TableName.valueOf('test'))
> 1000.times do |i|
> # batch 10000 deletes with different timestamps every 10 seconds
> now = System.currentTimeMillis()
> dels = 10000.times.map do |i|
> del = Delete.new(Bytes.toBytes('row'))
> del.addFamily(Bytes.toBytes('c'), now + i)
> end
> table.delete(dels)
> sleep(10)
> puts "i - #{i}"
> get 'test', 'row'
> end
> {code}
> {code}
> i - 0
> COLUMN
> CELL
> 0 row(s)
> Took 0.0251 seconds
> ...
> i - 10
> COLUMN
> CELL
> 0 row(s)
> Took 0.0412 seconds
> ...
> i - 20
> COLUMN
> CELL
> 0 row(s)
> Took 0.0760 seconds
> ...
> i - 30
> COLUMN
> CELL
> 0 row(s)
> Took 0.1014 seconds
> ...
> i - 40
> COLUMN
> CELL
> 0 row(s)
> Took 0.1616 seconds
> ...
> {code}
> But the performance of Get operations can be optimized by using SEEK_NEXT_COL.
> {code}
> i - 1
> COLUMN
> CELL
> 0 row(s)
> Took 0.0087 seconds
> ...
> i - 11
> COLUMN
> CELL
> 0 row(s)
> Took 0.0077 seconds
> ...
> i - 21
> COLUMN
> CELL
> 0 row(s)
> Took 0.0087 seconds
> ...
> {code}
> Please review the PR.
> https://github.com/apache/hbase/pull/6557
--
This message was sent by Atlassian Jira
(v8.20.10#820010)