[
https://issues.apache.org/jira/browse/HBASE-29039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eungsop Yoo updated HBASE-29039:
--------------------------------
Description:
I am confronted with a problem that some Get operations take several seconds.
!screenshot-2.png!
The reason was founded that users Put and Delete on some rows repeatedly. As
delete markers are accumulated on the same row or cell, Get operations slow
down. It can be reproduced by follow HBase shell commands.
{code}
create 'test', 'c'
java_import org.apache.hadoop.hbase.client.Delete
java_import org.apache.hadoop.hbase.TableName
java_import java.lang.System
con = @hbase.instance_variable_get(:@connection)
table = con.getTable(TableName.valueOf('test'))
1000.times do |i|
# batch 10000 deletes with different timestamps every 10 seconds
now = System.currentTimeMillis()
dels = 10000.times.map do |i|
del = Delete.new(Bytes.toBytes('row'))
del.addFamily(Bytes.toBytes('c'), now + i)
end
table.delete(dels)
sleep(10)
puts "i - #{i}"
get 'test', 'row'
end
{code}
{code}
i - 0
COLUMN
CELL
0 row(s)
Took 0.0251 seconds
...
i - 10
COLUMN
CELL
0 row(s)
Took 0.0412 seconds
...
i - 20
COLUMN
CELL
0 row(s)
Took 0.0760 seconds
...
i - 30
COLUMN
CELL
0 row(s)
Took 0.1014 seconds
...
i - 40
COLUMN
CELL
0 row(s)
Took 0.1616 seconds
...
{code}
But the performance of Get operations can be optimized by using SEEK_NEXT_COL.
{code}
i - 1
COLUMN
CELL
0 row(s)
Took 0.0087 seconds
...
i - 11
COLUMN
CELL
0 row(s)
Took 0.0077 seconds
...
i - 21
COLUMN
CELL
0 row(s)
Took 0.0087 seconds
...
{code}
Please review the PR.
https://github.com/apache/hbase/pull/6557
was:
I am confronted with a problem that some Get operations take several seconds.
!screenshot-1.png!
The reason was founded that users Put and Delete on some rows repeatedly. As
delete markers are accumulated on the same row or cell, Get operations slow
down. It can be reproduced by follow HBase shell commands.
{code}
create 'test', 'c'
java_import org.apache.hadoop.hbase.client.Delete
java_import org.apache.hadoop.hbase.TableName
java_import java.lang.System
con = @hbase.instance_variable_get(:@connection)
table = con.getTable(TableName.valueOf('test'))
1000.times do |i|
# batch 10000 deletes with different timestamps every 10 seconds
now = System.currentTimeMillis()
dels = 10000.times.map do |i|
del = Delete.new(Bytes.toBytes('row'))
del.addFamily(Bytes.toBytes('c'), now + i)
end
table.delete(dels)
sleep(10)
puts "i - #{i}"
get 'test', 'row'
end
{code}
{code}
i - 0
COLUMN
CELL
0 row(s)
Took 0.0251 seconds
...
i - 10
COLUMN
CELL
0 row(s)
Took 0.0412 seconds
...
i - 20
COLUMN
CELL
0 row(s)
Took 0.0760 seconds
...
i - 30
COLUMN
CELL
0 row(s)
Took 0.1014 seconds
...
i - 40
COLUMN
CELL
0 row(s)
Took 0.1616 seconds
...
{code}
But the performance of Get operations can be optimized by using SEEK_NEXT_COL.
{code}
i - 1
COLUMN
CELL
0 row(s)
Took 0.0087 seconds
...
i - 11
COLUMN
CELL
0 row(s)
Took 0.0077 seconds
...
i - 21
COLUMN
CELL
0 row(s)
Took 0.0087 seconds
...
{code}
Please review the PR.
https://github.com/apache/hbase/pull/6557
> Optimize read performance for accumulated delete markers on the same row or
> cell
> --------------------------------------------------------------------------------
>
> Key: HBASE-29039
> URL: https://issues.apache.org/jira/browse/HBASE-29039
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 2.6.1, 2.5.10
> Reporter: Eungsop Yoo
> Priority: Major
> Labels: pull-request-available
> Attachments: screenshot-2.png
>
>
> I am confronted with a problem that some Get operations take several seconds.
> !screenshot-2.png!
> The reason was founded that users Put and Delete on some rows repeatedly. As
> delete markers are accumulated on the same row or cell, Get operations slow
> down. It can be reproduced by follow HBase shell commands.
> {code}
> create 'test', 'c'
> java_import org.apache.hadoop.hbase.client.Delete
> java_import org.apache.hadoop.hbase.TableName
> java_import java.lang.System
> con = @hbase.instance_variable_get(:@connection)
> table = con.getTable(TableName.valueOf('test'))
> 1000.times do |i|
> # batch 10000 deletes with different timestamps every 10 seconds
> now = System.currentTimeMillis()
> dels = 10000.times.map do |i|
> del = Delete.new(Bytes.toBytes('row'))
> del.addFamily(Bytes.toBytes('c'), now + i)
> end
> table.delete(dels)
> sleep(10)
> puts "i - #{i}"
> get 'test', 'row'
> end
> {code}
> {code}
> i - 0
> COLUMN
> CELL
> 0 row(s)
> Took 0.0251 seconds
> ...
> i - 10
> COLUMN
> CELL
> 0 row(s)
> Took 0.0412 seconds
> ...
> i - 20
> COLUMN
> CELL
> 0 row(s)
> Took 0.0760 seconds
> ...
> i - 30
> COLUMN
> CELL
> 0 row(s)
> Took 0.1014 seconds
> ...
> i - 40
> COLUMN
> CELL
> 0 row(s)
> Took 0.1616 seconds
> ...
> {code}
> But the performance of Get operations can be optimized by using SEEK_NEXT_COL.
> {code}
> i - 1
> COLUMN
> CELL
> 0 row(s)
> Took 0.0087 seconds
> ...
> i - 11
> COLUMN
> CELL
> 0 row(s)
> Took 0.0077 seconds
> ...
> i - 21
> COLUMN
> CELL
> 0 row(s)
> Took 0.0087 seconds
> ...
> {code}
> Please review the PR.
> https://github.com/apache/hbase/pull/6557
--
This message was sent by Atlassian Jira
(v8.20.10#820010)