[
https://issues.apache.org/jira/browse/HBASE-26487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460826#comment-17460826
]
Duo Zhang commented on HBASE-26487:
-----------------------------------
Use the RegionReplicationLagEvaluation tool introduce in HBASE-26540 to compare
the replication lag for both master and HBASE-26233 branch.
For HBASE-26233, executed the command 3 times
{noformat}
./bin/hbase org.apache.hadoop.hbase.RegionReplicationLagEvaluation -vlen 1024
-r 10000
{noformat}
The results are
{noformat}
2021-12-16T23:16:50,280 INFO [main] hbase.RegionReplicationLagEvaluation: Test
finished, min lag 0 ms, max lag 22 ms, mean lag 0 ms
2021-12-16T23:16:50,280 INFO [main] hbase.RegionReplicationLagEvaluation:
25.0% lag: 0 ms
2021-12-16T23:16:50,280 INFO [main] hbase.RegionReplicationLagEvaluation:
50.0% lag: 0 ms
2021-12-16T23:16:50,280 INFO [main] hbase.RegionReplicationLagEvaluation:
75.0% lag: 0 ms
2021-12-16T23:16:50,280 INFO [main] hbase.RegionReplicationLagEvaluation:
90.0% lag: 0 ms
2021-12-16T23:16:50,280 INFO [main] hbase.RegionReplicationLagEvaluation:
95.0% lag: 0 ms
2021-12-16T23:16:50,280 INFO [main] hbase.RegionReplicationLagEvaluation:
98.0% lag: 1 ms
2021-12-16T23:16:50,280 INFO [main] hbase.RegionReplicationLagEvaluation:
99.0% lag: 2 ms
2021-12-16T23:16:50,280 INFO [main] hbase.RegionReplicationLagEvaluation:
99.9% lag: 3 ms
2021-12-16T23:17:20,229 INFO [main] hbase.RegionReplicationLagEvaluation: Test
finished, min lag 0 ms, max lag 7 ms, mean lag 0 ms
2021-12-16T23:17:20,229 INFO [main] hbase.RegionReplicationLagEvaluation:
25.0% lag: 0 ms
2021-12-16T23:17:20,229 INFO [main] hbase.RegionReplicationLagEvaluation:
50.0% lag: 0 ms
2021-12-16T23:17:20,229 INFO [main] hbase.RegionReplicationLagEvaluation:
75.0% lag: 0 ms
2021-12-16T23:17:20,230 INFO [main] hbase.RegionReplicationLagEvaluation:
90.0% lag: 0 ms
2021-12-16T23:17:20,230 INFO [main] hbase.RegionReplicationLagEvaluation:
95.0% lag: 0 ms
2021-12-16T23:17:20,230 INFO [main] hbase.RegionReplicationLagEvaluation:
98.0% lag: 0 ms
2021-12-16T23:17:20,230 INFO [main] hbase.RegionReplicationLagEvaluation:
99.0% lag: 0 ms
2021-12-16T23:17:20,230 INFO [main] hbase.RegionReplicationLagEvaluation:
99.9% lag: 3 ms
2021-12-16T23:19:25,922 INFO [main] hbase.RegionReplicationLagEvaluation: Test
finished, min lag 0 ms, max lag 17 ms, mean lag 0 ms
2021-12-16T23:19:25,922 INFO [main] hbase.RegionReplicationLagEvaluation:
25.0% lag: 0 ms
2021-12-16T23:19:25,923 INFO [main] hbase.RegionReplicationLagEvaluation:
50.0% lag: 0 ms
2021-12-16T23:19:25,923 INFO [main] hbase.RegionReplicationLagEvaluation:
75.0% lag: 0 ms
2021-12-16T23:19:25,923 INFO [main] hbase.RegionReplicationLagEvaluation:
90.0% lag: 0 ms
2021-12-16T23:19:25,923 INFO [main] hbase.RegionReplicationLagEvaluation:
95.0% lag: 0 ms
2021-12-16T23:19:25,923 INFO [main] hbase.RegionReplicationLagEvaluation:
98.0% lag: 0 ms
2021-12-16T23:19:25,923 INFO [main] hbase.RegionReplicationLagEvaluation:
99.0% lag: 0 ms
2021-12-16T23:19:25,923 INFO [main] hbase.RegionReplicationLagEvaluation:
99.9% lag: 3 ms
{noformat}
For master branch, I set the replication.source.sleepforretries to 100(ms)
since the default 1000(ms) will make the test running for a very long time. And
to make the test finish faster, I reduced the number of rows to 1000 instead of
10000, and also executed 3 times.
{noformat}
./bin/hbase org.apache.hadoop.hbase.RegionReplicationLagEvaluation -vlen 1024
-r 1000
{noformat}
The results are
{noformat}
2021-12-16T23:38:30,805 INFO [main] hbase.RegionReplicationLagEvaluation: Test
finished, min lag 3 ms, max lag 349 ms, mean lag 101 ms
2021-12-16T23:38:30,805 INFO [main] hbase.RegionReplicationLagEvaluation:
25.0% lag: 14 ms
2021-12-16T23:38:30,805 INFO [main] hbase.RegionReplicationLagEvaluation:
50.0% lag: 98 ms
2021-12-16T23:38:30,805 INFO [main] hbase.RegionReplicationLagEvaluation:
75.0% lag: 184 ms
2021-12-16T23:38:30,806 INFO [main] hbase.RegionReplicationLagEvaluation:
90.0% lag: 187 ms
2021-12-16T23:38:30,806 INFO [main] hbase.RegionReplicationLagEvaluation:
95.0% lag: 188 ms
2021-12-16T23:38:30,806 INFO [main] hbase.RegionReplicationLagEvaluation:
98.0% lag: 190 ms
2021-12-16T23:38:30,806 INFO [main] hbase.RegionReplicationLagEvaluation:
99.0% lag: 192 ms
2021-12-16T23:38:30,806 INFO [main] hbase.RegionReplicationLagEvaluation:
99.9% lag: 200 ms
2021-12-16T23:40:51,459 INFO [main] hbase.RegionReplicationLagEvaluation: Test
finished, min lag 2 ms, max lag 290 ms, mean lag 101 ms
2021-12-16T23:40:51,459 INFO [main] hbase.RegionReplicationLagEvaluation:
25.0% lag: 15 ms
2021-12-16T23:40:51,459 INFO [main] hbase.RegionReplicationLagEvaluation:
50.0% lag: 85 ms
2021-12-16T23:40:51,460 INFO [main] hbase.RegionReplicationLagEvaluation:
75.0% lag: 183 ms
2021-12-16T23:40:51,460 INFO [main] hbase.RegionReplicationLagEvaluation:
90.0% lag: 185 ms
2021-12-16T23:40:51,460 INFO [main] hbase.RegionReplicationLagEvaluation:
95.0% lag: 187 ms
2021-12-16T23:40:51,460 INFO [main] hbase.RegionReplicationLagEvaluation:
98.0% lag: 188 ms
2021-12-16T23:40:51,460 INFO [main] hbase.RegionReplicationLagEvaluation:
99.0% lag: 188 ms
2021-12-16T23:40:51,460 INFO [main] hbase.RegionReplicationLagEvaluation:
99.9% lag: 196 ms
2021-12-16T23:42:50,438 INFO [main] hbase.RegionReplicationLagEvaluation: Test
finished, min lag 2 ms, max lag 289 ms, mean lag 101 ms
2021-12-16T23:42:50,439 INFO [main] hbase.RegionReplicationLagEvaluation:
25.0% lag: 15 ms
2021-12-16T23:42:50,439 INFO [main] hbase.RegionReplicationLagEvaluation:
50.0% lag: 91 ms
2021-12-16T23:42:50,439 INFO [main] hbase.RegionReplicationLagEvaluation:
75.0% lag: 183 ms
2021-12-16T23:42:50,439 INFO [main] hbase.RegionReplicationLagEvaluation:
90.0% lag: 184 ms
2021-12-16T23:42:50,439 INFO [main] hbase.RegionReplicationLagEvaluation:
95.0% lag: 186 ms
2021-12-16T23:42:50,439 INFO [main] hbase.RegionReplicationLagEvaluation:
98.0% lag: 188 ms
2021-12-16T23:42:50,439 INFO [main] hbase.RegionReplicationLagEvaluation:
99.0% lag: 188 ms
2021-12-16T23:42:50,439 INFO [main] hbase.RegionReplicationLagEvaluation:
99.9% lag: 196 ms
{noformat}
The result for HBASE-26233 is much better, as expected, since it will send out
the wal edits directly, without touching the wal files. And it is exciting
that, the latency is really small, the 99.9% lag is still less than 10ms.
For master branch, the latency is highly depend on the
replication.source.sleepforretries config, as we will sleep for a while if
there is nothing to replicate. But we can not set
replication.source.sleepforretries too small as it will generate more pressure
on the namenode of the HDFS.
I think this will be a strong reason to get HBASE-26233 in.
> Run some tests to verify the new region replication framework
> -------------------------------------------------------------
>
> Key: HBASE-26487
> URL: https://issues.apache.org/jira/browse/HBASE-26487
> Project: HBase
> Issue Type: Sub-task
> Components: integration tests, test
> Reporter: Duo Zhang
> Assignee: Duo Zhang
> Priority: Major
>
> Make sure there is no big bugs before merging back.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)