Thanks for the detailed write up!

何良均 <[email protected]>于2022年9月1日 周四23:33写道:

> Meetings notes:
>
> Attendees: Duo Zhang, Yu Li, Xin Sun, Tianhang Tang, Liangjun He
>
> First Liangjun introduced the old implementation of
> ReplicationSyncUp/DumpReplicationQueues, as well as the existing problems
> and preliminary solutions under the new replication implementation. Then we
> discussed the relevant solutions, the following is the content of the
> discussion:
>
> ReplicationSyncUp tool
>
> 1. The ReplicationSyncUp tool can replicate the remaining data to the
> backup cluster when the master cluster crashes , if the master cluster
> crashes, the tool cannot access the hbase table. Is it possible to copy the
> data of replication queue info to ZK when it is written to the hbase table,
> and then implement the ReplicationSyncUp tool based on ZK again?
>
> Since our goal is to reduce the reliance of ZK when replication queue info
> is stored, this will break our goal. Maybe we can use the HMaster
> maintenance mode and pull up hbase:meta, and then perform additional repair
> operations to solve the problem, but considering that the HMaster
> maintenance mode only supports HMaster internal access to hbase:meta, the
> external cannot access , so this way cannot be used.
>
> After discussion, we all agree that if it does not rely on external
> storage under the new replication implementation, it is difficult to solve
> the problem. Then if ZK (or third-party storage system) is used, we will
> have a data sync problem, including how to sync-writing replication queue
> info to ZK (if it is real-time sync-writing, how to ensure the consistency
> between replication queue info writing to hbase table and ZK , if it is
> timed sync-writing, some redundant data will be replicated when
> ReplicationSyncUp is executed. Of course, partially redundant data may be
> acceptable), and when the master cluster is recovered, how to sync-writing
> the replication queue info back to hbase:replication table from ZK .
>
> Further, we can also solve the problem of ReplicationSyncUp accessing
> replication queue info based on the snapshot of the hbase:replicaiton
> table, for example, the snapshot of the hbase:replication table is
> periodically generated, and then when the ReplicationSyncUp tool is
> executed, the snapshot of the hbase:replication table is loaded into the
> memory. After the ReplicationSyncUp tool is executed and the data is
> replicated completely, we will regenerate a new snapshot based on the
> memory info and write it to the file system. When the master cluster is
> recovered, the HMaster will restore the hbase:meta table from the new
> snapshot.
>
> 2. If the ReplicationSyncUp tool is implemented based on the
> hbase:replication snapshot, after ReplicationSyncUp is executed and the
> data is replicated completely, it is necessary to ensure that the HMaster
> is started first when the master cluster is recovered, and the snapshot is
> restored to the hbase:replicaion , so as to avoid the situation where the
> RegionServer is started first and then the redundant data is replicated to
> the backup cluster, but the master cluster cannot guarantee that the
> HMaster will be started before the RegionServer when the cluster is
> recovered, so how to ensure that the HMaster first restores the snapshot to
> the hbase:replicaion table?
>
> Option 1: If RegionServer is started first, RegionServer determines that
> if there is corresponding snapshot of hbase:replication table, RegionServer
> replication related operations will wait until the HMaster starts and
> restores the snapshot to hbase:replicaiton table, then RegionServer will
> continue to replicate data. The advantage of this way is that it is
> transparent to the user, but the implementation is complicated.
>
> Option 2: After ReplicationSyncUp is executed, we disable the peer. Even
> if the RegionServer is started first, the replication operation will not be
> executed until the HMaster starts and restores the snapshot to the
> hbase:replicaiton table. The disadvantage of this way is that it will cause
> confusion to the user, because the peer is disabled when the user is
> unknown, and the advantage is that the implementation is simple.
>
> At present, it seems that there is no such solution that can solve the
> problem perfectly.
>
> DumpReplicationQueues tool
>
> 1. Under the new replication implementation, most of the info output by
> the DumpReplicationQueues tool can be obtained through the new interface,
> which is consistent with the old implementation, but the difference from
> the old implementation is that each queue in the new replication
> implementation will only save one wal and the corresponding offset info,
> while the old implementation will save all wal files and offset info under
> the queue, so the old DumpReplicationQueues tool will include all wal files
> and offset info when outputting queue info. In the new implementation, we
> can also directly access the file system to get all the wal files
> corresponding to the queue, which can be completely consistent with the
> output info of the old DumpReplicationQueues tool, and it doesn't cost too
> much, but is it necessary?
>
> It is recommended that the output of the wal file and offset info be
> consistent with the old version, to avoid the situation where users fail to
> upgrade the HBase version due to their dependence on the output info of the
> DumpReplicationQueues tool.
>
> Thanks.
>
>
>
>
> 在 2022-08-31 00:01:22,"何良均" <[email protected]> 写道:
>
> Last time we discussed the design doc Move replication queue storage from
> zookeeper to a separated HBase table, but the part of the replication tool
> was not discussed.
> This time we decided to discuss this part.
>
>
> We plan to hold an online meeting at 2PM to 3PM, 31 Aug, GMT +8, using
> tencent meeting.
>
>
> 何良均 邀请您参加腾讯会议
> 会议主题:replication tool讨论
> 会议时间:2022/08/31 14:00-15:00 (GMT+08:00) 中国标准时间 - 北京
>
>
> 点击链接入会,或添加至会议列表:
> https://meeting.tencent.com/dm/norZvACxGtya
>
>
> #腾讯会议:982-412-761
> 会议密码:210189
>
>
> More attendees are always welcomed.
>
>

Reply via email to