Re: [DISCUSS] Move replication queue storage from zookeeper to a separated HBase table

何良均 Thu, 01 Sep 2022 08:33:23 -0700

Meetings notes:

Attendees: Duo Zhang, Yu Li, Xin Sun, Tianhang Tang, Liangjun He


First Liangjun introduced the old implementation of 
ReplicationSyncUp/DumpReplicationQueues, as well as the existing problems and 
preliminary solutions under the new replication implementation. Then we 
discussed the relevant solutions, the following is the content of the 
discussion：

ReplicationSyncUp tool

1. The ReplicationSyncUp tool can replicate the remaining data to the backup 
cluster when the master cluster crashes , if the master cluster crashes, the 
tool cannot access the hbase table. Is it possible to copy the data of 
replication queue info to ZK when it is written to the hbase table, and then 
implement the ReplicationSyncUp tool based on ZK again?

Since our goal is to reduce the reliance of ZK when replication queue info is 
stored, this will break our goal. Maybe we can use the HMaster maintenance mode 
and pull up hbase:meta, and then perform additional repair operations to solve 
the problem, but considering that the HMaster maintenance mode only supports 
HMaster internal access to hbase:meta, the external cannot access , so this way 
cannot be used.

After discussion, we all agree that if it does not rely on external storage 
under the new replication implementation, it is difficult to solve the problem. 
Then if ZK (or third-party storage system) is used, we will have a data sync 
problem, including how to sync-writing replication queue info to ZK (if it is 
real-time sync-writing, how to ensure the consistency between replication queue 
info writing to hbase table and ZK , if it is timed sync-writing, some 
redundant data will be replicated when ReplicationSyncUp is executed. Of 
course, partially redundant data may be acceptable), and when the master 
cluster is recovered, how to sync-writing the replication queue info back to 
hbase:replication table from ZK .

Further, we can also solve the problem of ReplicationSyncUp accessing 
replication queue info based on the snapshot of the hbase:replicaiton table, 
for example, the snapshot of the hbase:replication table is periodically 
generated, and then when the ReplicationSyncUp tool is executed, the snapshot 
of the hbase:replication table is loaded into the memory. After the 
ReplicationSyncUp tool is executed and the data is replicated completely, we 
will regenerate a new snapshot based on the memory info and write it to the 
file system. When the master cluster is recovered, the HMaster will restore the 
hbase:meta table from the new snapshot.

2. If the ReplicationSyncUp tool is implemented based on the hbase:replication 
snapshot, after ReplicationSyncUp is executed and the data is replicated 
completely, it is necessary to ensure that the HMaster is started first when 
the master cluster is recovered, and the snapshot is restored to the 
hbase:replicaion , so as to avoid the situation where the RegionServer is 
started first and then the redundant data is replicated to the backup cluster, 
but the master cluster cannot guarantee that the HMaster will be started before 
the RegionServer when the cluster is recovered, so how to ensure that the 
HMaster first restores the snapshot to the hbase:replicaion table?

Option 1: If RegionServer is started first, RegionServer determines that if 
there is corresponding snapshot of hbase:replication table, RegionServer 
replication related operations will wait until the HMaster starts and restores 
the snapshot to hbase:replicaiton table, then RegionServer will continue to 
replicate data. The advantage of this way is that it is transparent to the 
user, but the implementation is complicated.

Option 2: After ReplicationSyncUp is executed, we disable the peer. Even if the 
RegionServer is started first, the replication operation will not be executed 
until the HMaster starts and restores the snapshot to the hbase:replicaiton 
table. The disadvantage of this way is that it will cause confusion to the 
user, because the peer is disabled when the user is unknown, and the advantage 
is that the implementation is simple.

At present, it seems that there is no such solution that can solve the problem 
perfectly.

DumpReplicationQueues tool

1. Under the new replication implementation, most of the info output by the 
DumpReplicationQueues tool can be obtained through the new interface, which is 
consistent with the old implementation, but the difference from the old 
implementation is that each queue in the new replication implementation will 
only save one wal and the corresponding offset info, while the old 
implementation will save all wal files and offset info under the queue, so the 
old DumpReplicationQueues tool will include all wal files and offset info when 
outputting queue info. In the new implementation, we can also directly access 
the file system to get all the wal files corresponding to the queue, which can 
be completely consistent with the output info of the old DumpReplicationQueues 
tool, and it doesn't cost too much, but is it necessary?

It is recommended that the output of the wal file and offset info be consistent 
with the old version, to avoid the situation where users fail to upgrade the 
HBase version due to their dependence on the output info of the 
DumpReplicationQueues tool.

Thanks.




在 2022-08-31 00:01:22，"何良均" <[email protected]> 写道：

Last time we discussed the design doc Move replication queue storage from 
zookeeper to a separated HBase table, but the part of the replication tool was 
not discussed. 
This time we decided to discuss this part.


We plan to hold an online meeting at 2PM to 3PM, 31 Aug, GMT +8, using tencent 
meeting.


何良均 邀请您参加腾讯会议
会议主题：replication tool讨论
会议时间：2022/08/31 14:00-15:00 (GMT+08:00) 中国标准时间 - 北京


点击链接入会，或添加至会议列表：
https://meeting.tencent.com/dm/norZvACxGtya


#腾讯会议：982-412-761
会议密码：210189


More attendees are always welcomed.

Re: [DISCUSS] Move replication queue storage from zookeeper to a separated HBase table

Reply via email to