[jira] [Updated] (HBASE-29039) Optimize read performance for accumulated delete markers on the same row or cell

Eungsop Yoo (Jira) Thu, 26 Dec 2024 18:38:27 -0800


     [ 
https://issues.apache.org/jira/browse/HBASE-29039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Eungsop Yoo updated HBASE-29039:
--------------------------------
    Description: 
I am confronted with a problem that some Get operations take several seconds. 
 !screenshot-2.png! 

The reason was founded that users Put and Delete on some rows repeatedly. As 
delete markers are accumulated on the same row or cell, Get operations slow 
down. It can be reproduced by follow HBase shell commands.

{code}
create 'test', 'c'

java_import org.apache.hadoop.hbase.client.Delete
java_import org.apache.hadoop.hbase.TableName
java_import java.lang.System

con = @hbase.instance_variable_get(:@connection)
table = con.getTable(TableName.valueOf('test'))

1000.times do |i|
  # batch 10000 deletes with different timestamps every 10 seconds
  now = System.currentTimeMillis()
  dels = 10000.times.map do |i|
    del = Delete.new(Bytes.toBytes('row'))
    del.addFamily(Bytes.toBytes('c'), now + i)
  end
  table.delete(dels)
  sleep(10)
  puts "i - #{i}"
  get 'test', 'row'
end
{code}
{code}
i - 0
COLUMN                                                                          
            CELL
0 row(s)
Took 0.0251 seconds
...
i - 10
COLUMN                                                                          
            CELL
0 row(s)
Took 0.0412 seconds
...
i - 20
COLUMN                                                                          
            CELL
0 row(s)
Took 0.0760 seconds
...
i - 30
COLUMN                                                                          
            CELL
0 row(s)
Took 0.1014 seconds
...
i - 40
COLUMN                                                                          
            CELL
0 row(s)
Took 0.1616 seconds
...
{code}

But the performance of Get operations can be optimized by using SEEK_NEXT_COL.
{code}
i - 1
COLUMN                                                                          
            CELL
0 row(s)
Took 0.0087 seconds
...
i - 11
COLUMN                                                                          
            CELL
0 row(s)
Took 0.0077 seconds
...
i - 21
COLUMN                                                                          
            CELL
0 row(s)
Took 0.0087 seconds
...
{code}

Please review the PR.
https://github.com/apache/hbase/pull/6557

  was:
I am confronted with a problem that some Get operations take several seconds. 
 !screenshot-1.png! 

The reason was founded that users Put and Delete on some rows repeatedly. As 
delete markers are accumulated on the same row or cell, Get operations slow 
down. It can be reproduced by follow HBase shell commands.

{code}
create 'test', 'c'

java_import org.apache.hadoop.hbase.client.Delete
java_import org.apache.hadoop.hbase.TableName
java_import java.lang.System

con = @hbase.instance_variable_get(:@connection)
table = con.getTable(TableName.valueOf('test'))

1000.times do |i|
  # batch 10000 deletes with different timestamps every 10 seconds
  now = System.currentTimeMillis()
  dels = 10000.times.map do |i|
    del = Delete.new(Bytes.toBytes('row'))
    del.addFamily(Bytes.toBytes('c'), now + i)
  end
  table.delete(dels)
  sleep(10)
  puts "i - #{i}"
  get 'test', 'row'
end
{code}
{code}
i - 0
COLUMN                                                                          
            CELL
0 row(s)
Took 0.0251 seconds
...
i - 10
COLUMN                                                                          
            CELL
0 row(s)
Took 0.0412 seconds
...
i - 20
COLUMN                                                                          
            CELL
0 row(s)
Took 0.0760 seconds
...
i - 30
COLUMN                                                                          
            CELL
0 row(s)
Took 0.1014 seconds
...
i - 40
COLUMN                                                                          
            CELL
0 row(s)
Took 0.1616 seconds
...
{code}

But the performance of Get operations can be optimized by using SEEK_NEXT_COL.
{code}
i - 1
COLUMN                                                                          
            CELL
0 row(s)
Took 0.0087 seconds
...
i - 11
COLUMN                                                                          
            CELL
0 row(s)
Took 0.0077 seconds
...
i - 21
COLUMN                                                                          
            CELL
0 row(s)
Took 0.0087 seconds
...
{code}

Please review the PR.
https://github.com/apache/hbase/pull/6557


> Optimize read performance for accumulated delete markers on the same row or 
> cell
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-29039
>                 URL: https://issues.apache.org/jira/browse/HBASE-29039
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 2.6.1, 2.5.10
>            Reporter: Eungsop Yoo
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: screenshot-2.png
>
>
> I am confronted with a problem that some Get operations take several seconds. 
>  !screenshot-2.png! 
> The reason was founded that users Put and Delete on some rows repeatedly. As 
> delete markers are accumulated on the same row or cell, Get operations slow 
> down. It can be reproduced by follow HBase shell commands.
> {code}
> create 'test', 'c'
> java_import org.apache.hadoop.hbase.client.Delete
> java_import org.apache.hadoop.hbase.TableName
> java_import java.lang.System
> con = @hbase.instance_variable_get(:@connection)
> table = con.getTable(TableName.valueOf('test'))
> 1000.times do |i|
>   # batch 10000 deletes with different timestamps every 10 seconds
>   now = System.currentTimeMillis()
>   dels = 10000.times.map do |i|
>     del = Delete.new(Bytes.toBytes('row'))
>     del.addFamily(Bytes.toBytes('c'), now + i)
>   end
>   table.delete(dels)
>   sleep(10)
>   puts "i - #{i}"
>   get 'test', 'row'
> end
> {code}
> {code}
> i - 0
> COLUMN                                                                        
>               CELL
> 0 row(s)
> Took 0.0251 seconds
> ...
> i - 10
> COLUMN                                                                        
>               CELL
> 0 row(s)
> Took 0.0412 seconds
> ...
> i - 20
> COLUMN                                                                        
>               CELL
> 0 row(s)
> Took 0.0760 seconds
> ...
> i - 30
> COLUMN                                                                        
>               CELL
> 0 row(s)
> Took 0.1014 seconds
> ...
> i - 40
> COLUMN                                                                        
>               CELL
> 0 row(s)
> Took 0.1616 seconds
> ...
> {code}
> But the performance of Get operations can be optimized by using SEEK_NEXT_COL.
> {code}
> i - 1
> COLUMN                                                                        
>               CELL
> 0 row(s)
> Took 0.0087 seconds
> ...
> i - 11
> COLUMN                                                                        
>               CELL
> 0 row(s)
> Took 0.0077 seconds
> ...
> i - 21
> COLUMN                                                                        
>               CELL
> 0 row(s)
> Took 0.0087 seconds
> ...
> {code}
> Please review the PR.
> https://github.com/apache/hbase/pull/6557



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HBASE-29039) Optimize read performance for accumulated delete markers on the same row or cell

Reply via email to