ninsmiracle commented on code in PR #40:
URL:
https://github.com/apache/incubator-pegasus-website/pull/40#discussion_r1436192907
##########
_docs/zh/administration/duplication.md:
##########
@@ -72,110 +82,110 @@ duplications of app [account_xiaomi] are listed as below:
面对这个需求,我们的操作思路是:
1. 首先源集群**保留从此刻开始的所有写增量**(即WAL日志)
-2. 将源集群的全量快照(冷备份)上传至 HDFS / xiaomi-FDS 等备份存储上。
-3. 然后恢复到目标集群。
-4. 此后源集群开启热备份,并复制此前堆积的写增量,复制到远端目标集群。
+2. 将源集群的全量快照(存量数据)移动到指定路径下,等待备集群(目标集群)对这些数据进行学习learn。
+3. 目标集群将存量数据学习完成后,告知源集群进入WAL日志发送阶段。
+4. 此后源集群开启热备份,并复制此前堆积的写增量,发送到远端目标集群。
-```
- +-----Source Table------+
- | |
- | +---------+ |
- 2. Backup | | | |
-+----------+ | | | |
-| | | | RocksDB | +-----+ |
-| snapshot +<------+ Store | | | |
-| | | | | | WAL +<-------+ 1. No GC
-+------+---+ | | | | | |
- | | +---------+ +---+-+ |
- | | | |
- | +-----------------------+
- | |
- | | 4. Start duplication
- | |
- | +-----------------v----+
- | | |
- +-------->+ |
- 3. Restore | |
- +------Dest Table------+
-```
+| master cluster |
| follower cluster
|
|
+| ------------------------------------------------------------ |
------------------------------------------------------------ |
------------------------------------------------------------ |
------------------------------------------------------------ |
+| meta | primary
| primary
| meta
|
+| | 0.replica
server在初始化时会进行周期性RPC的任务创建,进行replica server与meta之间的热备信息通信 |
|
|
+| 0.收到replica server发送的RPC,汇总热备任务和进度,回复replica server |
|
|
|
+| 1.发起添加表的热备任务的请求add_duplication,增加相关dup_info |
|
| |
+| 2.**进入状态 DS_PREPARE**,同步checkpoint点 |
3.得知meta上的新dup_info,创建replica_duplicator类,调用**trigger_manual_emergency_checkpoint**生成checkpoint。
| |
|
+| 4.得到replica
server报告全部checkpoint生成完毕,开始创建备用集群的表create_follower_app_for_duplication | -->
-->--> RPC_CM_CREATE_APP->--> | --> -->--> -->
携带主表信息--> --> --> --> |
5.接到RPC_CM_CREATE_APP请求,开始创建表。duplicate_checkpoint |
+| | <-- <-- <--
<-- <-- <-- <-- <-- <-- <-- <-- <-- <- | <-- <-- <-- <--建表成功 返回ERR_OK
<-- <- | 6.使用主表checkpoint初始化。发送拉取checkpoint的请求。底层调用nfs
copy的方法async_duplicate_checkpoint_from_master_replica |
+| 7.接收到ERR_OK的返回,**进入DS_APP状态** |
|
|
|
+|
8.下一轮通讯中,在DS_APP状态下检查创建完成的表。无误后,**进入DS_LOG状态**check_follower_app_if_create_completed
| |
|
|
+| | 9.replica
server首次得知status已经切换到DS_LOG,开始热备plog部分数据start_dup_log |
|
|
+| | 10.load重放加载日志
ship打包发送 |
11.作为服务端接收ship的包,解包并根据具体包含的RPC类型处理pegasus_write_service::duplicate |
|
-### 执行步骤1
-如何保留从此刻开始的所有写增量?我们可以如此进行操作:
-首先使用 `add_dup [--freezed/-f]` 表示不进行日志复制,它的原理就是阻止当前日志 GC(log compaction)。该操作
**必须最先执行**,否则无法保证数据完整性。
-```sh
-## bjsrv-account
->>> add_dup account_xiaomi tjsrv-account --freezed
-```
-接着每个分片都会记录**当前确认点(confirmed_decree)**,并持久化到 MetaServer 上。
-注意需等待所有的分片都将当前确认点更新至MetaServer后,才可进行下一步操作,这是该功能正确性的前提。
+下面介绍给一张线上表开启具体的热备所需的步骤
-`confirme_decree` 值为 -1 即表示该分片的确认点尚未同步。
+### 执行步骤1 集群热备参数设置
-```
->>> query_dup -d account_xiaomi 1535008534
->>>
{"dupid":1548442533,"status":"DS_START","remote":"c4srv-feedhistory","create_ts":1548442533763,"progress":[{"pid":0,"confirmed":-1},{"pid":1,"confirmed":276444333},{"pid":2,"confirmed":-1},{"pid":3,"confirmed":-1},{"pid":4,"confirmed":-1},{"pid":5,"confirmed":-1},{"pid":6,"confirmed":-1},{"pid":7,"confirmed":279069949},{"pid":8,"confirmed":-1}}
+主备集群两边的replication与duplication-group项下**相关参数须保持一致。**其中,主集群指同步数据的发送方,备集群指接收方。
->>> query_dup -d account_xiaomi 1535008534
->>>
{"dupid":1548442533,"status":"DS_START","remote":"c4srv-feedhistory","create_ts":1548442533763,"progress":[{"pid":0,"confirmed":276444111},{"pid":1,"confirmed":276444333},{"pid":2,"confirmed":276444332},{"pid":3,"confirmed":276444222},{"pid":4,"confirmed":276444111},{"pid":5,"confirmed":276444377},{"pid":6,"confirmed":276444388},{"pid":7,"confirmed":279069949},{"pid":8,"confirmed":276444399}}
-```
+主集群配置示例:
-### 执行步骤2,3
+```Shell
+[replication]
+ duplication_enabled = true
+ duplicate_log_batch_bytes = 4096 #
0意味着不做batch处理,一般设置为4096即可,该配置可以通过admin-cli的server-config动态修改
-使用冷备份功能将数据快照上传至远端存储,再使用恢复功能在目标集群(tjsrv-account)恢复该表。示例命令如下:
+[pegasus.clusters]
+ # 开启热备份的主集群必须配置备集群的具体meta地址:
+ tjsrv-account = xxxxxxxxx
+# 热备份的两个集群需要登记源集群和目的集群的“cluster_id”:
+ [[duplication-group]]
+ bjsrv-account = 1
+ tjsrv-account = 2
```
-# 立刻对表(app_id = 12)进行冷备
-./run.sh shell -n bjsrv-account
->>> add_backup_policy -p dup_transfer -b fds_wq -a 12 -i 86400 -s 12:01 -c 1
-
-# 耐心等待备份生成
->>> query_backup_policy -p dup_transfer
-policy_info:
- name : dup_transfer
- backup_provider_type : fds_wq
- backup_interval : 86400s
- app_ids : {12}
- start_time : 12:01
- status : enabled
- backup_history_count : 1
-backup_infos:
-[1]
- id : 1541649698875
- start_time : 2018-11-08 12:01:38
- end_time : 2018-11-08 12:03:51
- app_ids : {60}
-
-# 在天津机房恢复表
-./run.sh shell -n tjsrv-account
->>> restore_app -c bjsrv-account -p dup_transfer -a account_xiaomi -i 12 -t
1541649698875 -b fds_wq
+
+
Review Comment:
是的,因为现在存量数据不走外部hdfs或者fds了。 第一步只需要在集群配置中加一些项
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]