Re:Re: [DISCUSS] Move replication queue storage from zookeeper to a separated HBase table

zhengsicheng Thu, 01 Sep 2022 23:52:00 -0700

The idea of using HBASE table replace replication queue storage is very good





At 2022-09-02 14:08:30, "张铎(Duo Zhang)" <[email protected]> wrote:
>Thanks for the detailed write up!
>
>何良均 <[email protected]>于2022年9月1日 周四23:33写道：
>
>> Meetings notes:
>>
>> Attendees: Duo Zhang, Yu Li, Xin Sun, Tianhang Tang, Liangjun He
>>
>> First Liangjun introduced the old implementation of
>> ReplicationSyncUp/DumpReplicationQueues, as well as the existing problems
>> and preliminary solutions under the new replication implementation. Then we
>> discussed the relevant solutions, the following is the content of the
>> discussion：
>>
>> ReplicationSyncUp tool
>>
>> 1. The ReplicationSyncUp tool can replicate the remaining data to the
>> backup cluster when the master cluster crashes , if the master cluster
>> crashes, the tool cannot access the hbase table. Is it possible to copy the
>> data of replication queue info to ZK when it is written to the hbase table,
>> and then implement the ReplicationSyncUp tool based on ZK again?
>>
>> Since our goal is to reduce the reliance of ZK when replication queue info
>> is stored, this will break our goal. Maybe we can use the HMaster
>> maintenance mode and pull up hbase:meta, and then perform additional repair
>> operations to solve the problem, but considering that the HMaster
>> maintenance mode only supports HMaster internal access to hbase:meta, the
>> external cannot access , so this way cannot be used.
>>
>> After discussion, we all agree that if it does not rely on external
>> storage under the new replication implementation, it is difficult to solve
>> the problem. Then if ZK (or third-party storage system) is used, we will
>> have a data sync problem, including how to sync-writing replication queue
>> info to ZK (if it is real-time sync-writing, how to ensure the consistency
>> between replication queue info writing to hbase table and ZK , if it is
>> timed sync-writing, some redundant data will be replicated when
>> ReplicationSyncUp is executed. Of course, partially redundant data may be
>> acceptable), and when the master cluster is recovered, how to sync-writing
>> the replication queue info back to hbase:replication table from ZK .
>>
>> Further, we can also solve the problem of ReplicationSyncUp accessing
>> replication queue info based on the snapshot of the hbase:replicaiton
>> table, for example, the snapshot of the hbase:replication table is
>> periodically generated, and then when the ReplicationSyncUp tool is
>> executed, the snapshot of the hbase:replication table is loaded into the
>> memory. After the ReplicationSyncUp tool is executed and the data is
>> replicated completely, we will regenerate a new snapshot based on the
>> memory info and write it to the file system. When the master cluster is
>> recovered, the HMaster will restore the hbase:meta table from the new
>> snapshot.
>>
>> 2. If the ReplicationSyncUp tool is implemented based on the
>> hbase:replication snapshot, after ReplicationSyncUp is executed and the
>> data is replicated completely, it is necessary to ensure that the HMaster
>> is started first when the master cluster is recovered, and the snapshot is
>> restored to the hbase:replicaion , so as to avoid the situation where the
>> RegionServer is started first and then the redundant data is replicated to
>> the backup cluster, but the master cluster cannot guarantee that the
>> HMaster will be started before the RegionServer when the cluster is
>> recovered, so how to ensure that the HMaster first restores the snapshot to
>> the hbase:replicaion table?
>>
>> Option 1: If RegionServer is started first, RegionServer determines that
>> if there is corresponding snapshot of hbase:replication table, RegionServer
>> replication related operations will wait until the HMaster starts and
>> restores the snapshot to hbase:replicaiton table, then RegionServer will
>> continue to replicate data. The advantage of this way is that it is
>> transparent to the user, but the implementation is complicated.
>>
>> Option 2: After ReplicationSyncUp is executed, we disable the peer. Even
>> if the RegionServer is started first, the replication operation will not be
>> executed until the HMaster starts and restores the snapshot to the
>> hbase:replicaiton table. The disadvantage of this way is that it will cause
>> confusion to the user, because the peer is disabled when the user is
>> unknown, and the advantage is that the implementation is simple.
>>
>> At present, it seems that there is no such solution that can solve the
>> problem perfectly.
>>
>> DumpReplicationQueues tool
>>
>> 1. Under the new replication implementation, most of the info output by
>> the DumpReplicationQueues tool can be obtained through the new interface,
>> which is consistent with the old implementation, but the difference from
>> the old implementation is that each queue in the new replication
>> implementation will only save one wal and the corresponding offset info,
>> while the old implementation will save all wal files and offset info under
>> the queue, so the old DumpReplicationQueues tool will include all wal files
>> and offset info when outputting queue info. In the new implementation, we
>> can also directly access the file system to get all the wal files
>> corresponding to the queue, which can be completely consistent with the
>> output info of the old DumpReplicationQueues tool, and it doesn't cost too
>> much, but is it necessary?
>>
>> It is recommended that the output of the wal file and offset info be
>> consistent with the old version, to avoid the situation where users fail to
>> upgrade the HBase version due to their dependence on the output info of the
>> DumpReplicationQueues tool.
>>
>> Thanks.
>>
>>
>>
>>
>> 在 2022-08-31 00:01:22，"何良均" <[email protected]> 写道：
>>
>> Last time we discussed the design doc Move replication queue storage from
>> zookeeper to a separated HBase table, but the part of the replication tool
>> was not discussed.
>> This time we decided to discuss this part.
>>
>>
>> We plan to hold an online meeting at 2PM to 3PM, 31 Aug, GMT +8, using
>> tencent meeting.
>>
>>
>> 何良均 邀请您参加腾讯会议
>> 会议主题：replication tool讨论
>> 会议时间：2022/08/31 14:00-15:00 (GMT+08:00) 中国标准时间 - 北京
>>
>>
>> 点击链接入会，或添加至会议列表：
>> https://meeting.tencent.com/dm/norZvACxGtya
>>
>>
>> #腾讯会议：982-412-761
>> 会议密码：210189
>>
>>
>> More attendees are always welcomed.
>>
>>
Re:Re: [DISCUSS] Move replication queue storage from zookeeper to a separated HBase table

Reply via email to