[ 
https://issues.apache.org/jira/browse/HBASE-26487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460826#comment-17460826
 ] 

Duo Zhang commented on HBASE-26487:
-----------------------------------

Use the RegionReplicationLagEvaluation tool introduce in HBASE-26540 to compare 
the replication lag for both master and HBASE-26233 branch.

For HBASE-26233, executed the command 3 times
{noformat}
./bin/hbase org.apache.hadoop.hbase.RegionReplicationLagEvaluation -vlen 1024 
-r 10000
{noformat}

The results are
{noformat}
2021-12-16T23:16:50,280 INFO  [main] hbase.RegionReplicationLagEvaluation: Test 
finished, min lag 0 ms, max lag 22 ms, mean lag 0 ms
2021-12-16T23:16:50,280 INFO  [main] hbase.RegionReplicationLagEvaluation: 
25.0% lag: 0 ms
2021-12-16T23:16:50,280 INFO  [main] hbase.RegionReplicationLagEvaluation: 
50.0% lag: 0 ms
2021-12-16T23:16:50,280 INFO  [main] hbase.RegionReplicationLagEvaluation: 
75.0% lag: 0 ms
2021-12-16T23:16:50,280 INFO  [main] hbase.RegionReplicationLagEvaluation: 
90.0% lag: 0 ms
2021-12-16T23:16:50,280 INFO  [main] hbase.RegionReplicationLagEvaluation: 
95.0% lag: 0 ms
2021-12-16T23:16:50,280 INFO  [main] hbase.RegionReplicationLagEvaluation: 
98.0% lag: 1 ms
2021-12-16T23:16:50,280 INFO  [main] hbase.RegionReplicationLagEvaluation: 
99.0% lag: 2 ms
2021-12-16T23:16:50,280 INFO  [main] hbase.RegionReplicationLagEvaluation: 
99.9% lag: 3 ms

2021-12-16T23:17:20,229 INFO  [main] hbase.RegionReplicationLagEvaluation: Test 
finished, min lag 0 ms, max lag 7 ms, mean lag 0 ms
2021-12-16T23:17:20,229 INFO  [main] hbase.RegionReplicationLagEvaluation: 
25.0% lag: 0 ms
2021-12-16T23:17:20,229 INFO  [main] hbase.RegionReplicationLagEvaluation: 
50.0% lag: 0 ms
2021-12-16T23:17:20,229 INFO  [main] hbase.RegionReplicationLagEvaluation: 
75.0% lag: 0 ms
2021-12-16T23:17:20,230 INFO  [main] hbase.RegionReplicationLagEvaluation: 
90.0% lag: 0 ms
2021-12-16T23:17:20,230 INFO  [main] hbase.RegionReplicationLagEvaluation: 
95.0% lag: 0 ms
2021-12-16T23:17:20,230 INFO  [main] hbase.RegionReplicationLagEvaluation: 
98.0% lag: 0 ms
2021-12-16T23:17:20,230 INFO  [main] hbase.RegionReplicationLagEvaluation: 
99.0% lag: 0 ms
2021-12-16T23:17:20,230 INFO  [main] hbase.RegionReplicationLagEvaluation: 
99.9% lag: 3 ms

2021-12-16T23:19:25,922 INFO  [main] hbase.RegionReplicationLagEvaluation: Test 
finished, min lag 0 ms, max lag 17 ms, mean lag 0 ms
2021-12-16T23:19:25,922 INFO  [main] hbase.RegionReplicationLagEvaluation: 
25.0% lag: 0 ms
2021-12-16T23:19:25,923 INFO  [main] hbase.RegionReplicationLagEvaluation: 
50.0% lag: 0 ms
2021-12-16T23:19:25,923 INFO  [main] hbase.RegionReplicationLagEvaluation: 
75.0% lag: 0 ms
2021-12-16T23:19:25,923 INFO  [main] hbase.RegionReplicationLagEvaluation: 
90.0% lag: 0 ms
2021-12-16T23:19:25,923 INFO  [main] hbase.RegionReplicationLagEvaluation: 
95.0% lag: 0 ms
2021-12-16T23:19:25,923 INFO  [main] hbase.RegionReplicationLagEvaluation: 
98.0% lag: 0 ms
2021-12-16T23:19:25,923 INFO  [main] hbase.RegionReplicationLagEvaluation: 
99.0% lag: 0 ms
2021-12-16T23:19:25,923 INFO  [main] hbase.RegionReplicationLagEvaluation: 
99.9% lag: 3 ms
{noformat}

For master branch, I set the replication.source.sleepforretries to 100(ms) 
since the default 1000(ms) will make the test running for a very long time. And 
to make the test finish faster, I reduced the number of rows to 1000 instead of 
10000, and also executed 3 times.
{noformat}
./bin/hbase org.apache.hadoop.hbase.RegionReplicationLagEvaluation -vlen 1024 
-r 1000
{noformat}

The results are
{noformat}
2021-12-16T23:38:30,805 INFO  [main] hbase.RegionReplicationLagEvaluation: Test 
finished, min lag 3 ms, max lag 349 ms, mean lag 101 ms
2021-12-16T23:38:30,805 INFO  [main] hbase.RegionReplicationLagEvaluation: 
25.0% lag: 14 ms
2021-12-16T23:38:30,805 INFO  [main] hbase.RegionReplicationLagEvaluation: 
50.0% lag: 98 ms
2021-12-16T23:38:30,805 INFO  [main] hbase.RegionReplicationLagEvaluation: 
75.0% lag: 184 ms
2021-12-16T23:38:30,806 INFO  [main] hbase.RegionReplicationLagEvaluation: 
90.0% lag: 187 ms
2021-12-16T23:38:30,806 INFO  [main] hbase.RegionReplicationLagEvaluation: 
95.0% lag: 188 ms
2021-12-16T23:38:30,806 INFO  [main] hbase.RegionReplicationLagEvaluation: 
98.0% lag: 190 ms
2021-12-16T23:38:30,806 INFO  [main] hbase.RegionReplicationLagEvaluation: 
99.0% lag: 192 ms
2021-12-16T23:38:30,806 INFO  [main] hbase.RegionReplicationLagEvaluation: 
99.9% lag: 200 ms

2021-12-16T23:40:51,459 INFO  [main] hbase.RegionReplicationLagEvaluation: Test 
finished, min lag 2 ms, max lag 290 ms, mean lag 101 ms
2021-12-16T23:40:51,459 INFO  [main] hbase.RegionReplicationLagEvaluation: 
25.0% lag: 15 ms
2021-12-16T23:40:51,459 INFO  [main] hbase.RegionReplicationLagEvaluation: 
50.0% lag: 85 ms
2021-12-16T23:40:51,460 INFO  [main] hbase.RegionReplicationLagEvaluation: 
75.0% lag: 183 ms
2021-12-16T23:40:51,460 INFO  [main] hbase.RegionReplicationLagEvaluation: 
90.0% lag: 185 ms
2021-12-16T23:40:51,460 INFO  [main] hbase.RegionReplicationLagEvaluation: 
95.0% lag: 187 ms
2021-12-16T23:40:51,460 INFO  [main] hbase.RegionReplicationLagEvaluation: 
98.0% lag: 188 ms
2021-12-16T23:40:51,460 INFO  [main] hbase.RegionReplicationLagEvaluation: 
99.0% lag: 188 ms
2021-12-16T23:40:51,460 INFO  [main] hbase.RegionReplicationLagEvaluation: 
99.9% lag: 196 ms

2021-12-16T23:42:50,438 INFO  [main] hbase.RegionReplicationLagEvaluation: Test 
finished, min lag 2 ms, max lag 289 ms, mean lag 101 ms
2021-12-16T23:42:50,439 INFO  [main] hbase.RegionReplicationLagEvaluation: 
25.0% lag: 15 ms
2021-12-16T23:42:50,439 INFO  [main] hbase.RegionReplicationLagEvaluation: 
50.0% lag: 91 ms
2021-12-16T23:42:50,439 INFO  [main] hbase.RegionReplicationLagEvaluation: 
75.0% lag: 183 ms
2021-12-16T23:42:50,439 INFO  [main] hbase.RegionReplicationLagEvaluation: 
90.0% lag: 184 ms
2021-12-16T23:42:50,439 INFO  [main] hbase.RegionReplicationLagEvaluation: 
95.0% lag: 186 ms
2021-12-16T23:42:50,439 INFO  [main] hbase.RegionReplicationLagEvaluation: 
98.0% lag: 188 ms
2021-12-16T23:42:50,439 INFO  [main] hbase.RegionReplicationLagEvaluation: 
99.0% lag: 188 ms
2021-12-16T23:42:50,439 INFO  [main] hbase.RegionReplicationLagEvaluation: 
99.9% lag: 196 ms
{noformat}

The result for HBASE-26233 is much better, as expected, since it will send out 
the wal edits directly, without touching the wal files. And it is exciting 
that, the latency is really small, the 99.9% lag is still less than 10ms.

For master branch, the latency is highly depend on the 
replication.source.sleepforretries config, as we will sleep for a while  if 
there is nothing to replicate. But we can not set 
replication.source.sleepforretries too small as it will generate more pressure 
on the namenode of the HDFS.

I think this will be a strong reason to get HBASE-26233 in.

> Run some tests to verify the new region replication framework
> -------------------------------------------------------------
>
>                 Key: HBASE-26487
>                 URL: https://issues.apache.org/jira/browse/HBASE-26487
>             Project: HBase
>          Issue Type: Sub-task
>          Components: integration tests, test
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Major
>
> Make sure there is no big bugs before merging back.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to