Thanks for the detailed write up! 何良均 <[email protected]>于2022年9月1日 周四23:33写道:
> Meetings notes: > > Attendees: Duo Zhang, Yu Li, Xin Sun, Tianhang Tang, Liangjun He > > First Liangjun introduced the old implementation of > ReplicationSyncUp/DumpReplicationQueues, as well as the existing problems > and preliminary solutions under the new replication implementation. Then we > discussed the relevant solutions, the following is the content of the > discussion: > > ReplicationSyncUp tool > > 1. The ReplicationSyncUp tool can replicate the remaining data to the > backup cluster when the master cluster crashes , if the master cluster > crashes, the tool cannot access the hbase table. Is it possible to copy the > data of replication queue info to ZK when it is written to the hbase table, > and then implement the ReplicationSyncUp tool based on ZK again? > > Since our goal is to reduce the reliance of ZK when replication queue info > is stored, this will break our goal. Maybe we can use the HMaster > maintenance mode and pull up hbase:meta, and then perform additional repair > operations to solve the problem, but considering that the HMaster > maintenance mode only supports HMaster internal access to hbase:meta, the > external cannot access , so this way cannot be used. > > After discussion, we all agree that if it does not rely on external > storage under the new replication implementation, it is difficult to solve > the problem. Then if ZK (or third-party storage system) is used, we will > have a data sync problem, including how to sync-writing replication queue > info to ZK (if it is real-time sync-writing, how to ensure the consistency > between replication queue info writing to hbase table and ZK , if it is > timed sync-writing, some redundant data will be replicated when > ReplicationSyncUp is executed. Of course, partially redundant data may be > acceptable), and when the master cluster is recovered, how to sync-writing > the replication queue info back to hbase:replication table from ZK . > > Further, we can also solve the problem of ReplicationSyncUp accessing > replication queue info based on the snapshot of the hbase:replicaiton > table, for example, the snapshot of the hbase:replication table is > periodically generated, and then when the ReplicationSyncUp tool is > executed, the snapshot of the hbase:replication table is loaded into the > memory. After the ReplicationSyncUp tool is executed and the data is > replicated completely, we will regenerate a new snapshot based on the > memory info and write it to the file system. When the master cluster is > recovered, the HMaster will restore the hbase:meta table from the new > snapshot. > > 2. If the ReplicationSyncUp tool is implemented based on the > hbase:replication snapshot, after ReplicationSyncUp is executed and the > data is replicated completely, it is necessary to ensure that the HMaster > is started first when the master cluster is recovered, and the snapshot is > restored to the hbase:replicaion , so as to avoid the situation where the > RegionServer is started first and then the redundant data is replicated to > the backup cluster, but the master cluster cannot guarantee that the > HMaster will be started before the RegionServer when the cluster is > recovered, so how to ensure that the HMaster first restores the snapshot to > the hbase:replicaion table? > > Option 1: If RegionServer is started first, RegionServer determines that > if there is corresponding snapshot of hbase:replication table, RegionServer > replication related operations will wait until the HMaster starts and > restores the snapshot to hbase:replicaiton table, then RegionServer will > continue to replicate data. The advantage of this way is that it is > transparent to the user, but the implementation is complicated. > > Option 2: After ReplicationSyncUp is executed, we disable the peer. Even > if the RegionServer is started first, the replication operation will not be > executed until the HMaster starts and restores the snapshot to the > hbase:replicaiton table. The disadvantage of this way is that it will cause > confusion to the user, because the peer is disabled when the user is > unknown, and the advantage is that the implementation is simple. > > At present, it seems that there is no such solution that can solve the > problem perfectly. > > DumpReplicationQueues tool > > 1. Under the new replication implementation, most of the info output by > the DumpReplicationQueues tool can be obtained through the new interface, > which is consistent with the old implementation, but the difference from > the old implementation is that each queue in the new replication > implementation will only save one wal and the corresponding offset info, > while the old implementation will save all wal files and offset info under > the queue, so the old DumpReplicationQueues tool will include all wal files > and offset info when outputting queue info. In the new implementation, we > can also directly access the file system to get all the wal files > corresponding to the queue, which can be completely consistent with the > output info of the old DumpReplicationQueues tool, and it doesn't cost too > much, but is it necessary? > > It is recommended that the output of the wal file and offset info be > consistent with the old version, to avoid the situation where users fail to > upgrade the HBase version due to their dependence on the output info of the > DumpReplicationQueues tool. > > Thanks. > > > > > 在 2022-08-31 00:01:22,"何良均" <[email protected]> 写道: > > Last time we discussed the design doc Move replication queue storage from > zookeeper to a separated HBase table, but the part of the replication tool > was not discussed. > This time we decided to discuss this part. > > > We plan to hold an online meeting at 2PM to 3PM, 31 Aug, GMT +8, using > tencent meeting. > > > 何良均 邀请您参加腾讯会议 > 会议主题:replication tool讨论 > 会议时间:2022/08/31 14:00-15:00 (GMT+08:00) 中国标准时间 - 北京 > > > 点击链接入会,或添加至会议列表: > https://meeting.tencent.com/dm/norZvACxGtya > > > #腾讯会议:982-412-761 > 会议密码:210189 > > > More attendees are always welcomed. > >
