Re: [DISCUSS] Move replication queue storage from zookeeper to a separated HBase table

Duo Zhang Mon, 28 Nov 2022 07:51:33 -0800

Thanks Liangjun for putting this up. I think the discussion was very
constructive, we found some blind points in the previous discussions,
for example, after the ReplicationSyncUp tool is done, all the data in
hbase:replication are useless.


Later we also need to modify the design doc accordingly.

Liangjun He <[email protected]> 于2022年11月28日周一 23:34写道：
>
> Attendees: Duo Zhang, Yu Li, Liangjun He
>
> First, Liangjun introduced the discussion of the ReplicationSyncUp tool last 
> time (see: https://lists.apache.org/thread/1yzy60wbgomvlhlbocps1jklc0x5t349), 
> The ReplicationSyncUp tool removes ZK dependency by reading the latest 
> snapshot data of the hbase:replication table, and generates new 
> hbase:replication table snapshot after the execution of ReplicationSyncUp. At 
> the same time, when the master cluster recovers, it needs to ensure that 
> HMaster is started before the RegionServer to restore the hbase:replication 
> snapshot to a table. Duo thinks that it is relatively complicated to use the 
> snapshot, and considers changing to the regular flush hbase:replication 
> table, and then the ReplicationSyncUp tool can directly read the table data 
> to implement the ReplicationSyncUp tool. Then we discussed the relevant 
> solutions, the following is the content of the discussion：
>
> 1. How does the ReplicationSyncUp tool read the data of the hbase:replication 
> table If we rely on periodically flushing the hbase:replication table?
>
> When the ReplicationSyncUp tool is executed, the master cluster is in a down 
> state. Because the hbase:replication table is flushed regularly, 
> ReplicationSyncUp can directly read the hbase:replication table data offline. 
> This way has no technical challenges and is simpler. Of course, the flush way 
> and the snapshot way have the same problem, because flush is executed 
> regularly, there is a certain delay time, which will also lead to redundant 
> data being replicated to the slave cluster when ReplicationSyncUp is executed.
>
> 2. How to modify the hbase:replication table data after ReplicationSyncUp is 
> executed?
>
> After ReplicationSyncUp is executed, the data need to be replicated by the 
> master cluster has been replicated. Theoretically, the data in the 
> hbase:replication table needs to be cleaned up. When ReplicationSyncUp is 
> executed, an flag can be written to the file system, and the master cluster 
> HMaster recovers, the data in the hbase:replication table can be cleaned 
> according to this flag. After cleaning, we must delete this flag to avoid 
> repeatedly cleaning the hbase:replication table.
>
> 3. Does the data cleaning of the hbase:replication table require that the 
> HMaster be started before the RegionServer when the master cluster recovers 
> to avoid inconsistency of hbase:replication data?
>
> HMaster does not need to be started before RegionServer for two reasons:
>
> a. If the RegionServer is started first, the RegionServer will be in the 
> initialization state until the HMaster is started, no regions are assigned to 
> it, so no data needs to replicated, and the hbase:replication table will not 
> be modified;
>
> b. If the RegionServer is started first, it will not claim the replication 
> queue of dead RegionServer, because this process is launched in the 
> ServerCrashProcedure, and ServerCrashProcedure is executed by HMaster.
>
>
>
>
> Thanks.
>
>
>
>
> 在 2022-11-22 11:33:13，"Liangjun He" <[email protected]> 写道：
>
> Last time we discussed the replication tool of design doc Move replication 
> queue storage from zookeeper to a separated HBase table. However, we think 
> that the implementation of ReplicationSyncUp tool is slightly complicated, so 
> we decide to discuss it again separately.
>
>
>
>
> We plan to hold an online meeting at 7PM to 8PM, 23 Nov 2022, GMT +8, using 
> tencent meeting.
>
>
> 何良均 邀请您参加腾讯会议
> 会议主题：ReplicationSyncUp讨论
> 会议时间：2022/11/23 19:00-20:00 (GMT+08:00) 中国标准时间 - 北京
>
> 点击链接入会，或添加至会议列表：
> https://meeting.tencent.com/dm/uAO9OU5ghD3y
>
> #腾讯会议：138-478-728
> 会议密码：432745
>
>
> More attendees are always welcomed.

Re: [DISCUSS] Move replication queue storage from zookeeper to a separated HBase table

Reply via email to