[I] [Bug] When a master node switchover occurs on the Doris FE, CCR restarts fullsync [doris]

via GitHub Thu, 14 May 2026 18:33:30 -0700


dzmxcyr opened a new issue, #63269:
URL: https://github.com/apache/doris/issues/63269


   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Version
   
   Source Doris：2.1.11-x64
   Target Doris：2.1.11-arm64
   CCR：ccr-syncer-3.0.6-rc05-arm64
   
   ### What's Wrong?
   
   After creating a database-level CCR replication task, CCR starts full 
synchronization normally and then enters the incremental synchronization phase, 
with everything working properly.
   However, when the upstream Doris FE master node switches to another node, 
CCR triggers a fullsync and pulls data again from scratch.
   Due to the large volume of data, the synchronization takes a long time and 
has a significant impact on the production environment.
   
   With the ccr log:
   [2026-05-14 09:33:51.786] WARN call [:0] error: GetBinlog error: remote or 
network error: get connection error: dial tcp :0: connection has been closed by 
peer, req: TGetBinlogRequest({Cluster: User:0x40001a8378 Passwd:0x40001a8388 
Db:0x40001a83a8 Table: TableId: UserIp: Token: PrevCommitSeq:0x400082e928 
NumAcquired:0x400082e930}): [rpc] remote or network error: get connection 
error: dial tcp :0: connection has been closed by peer, try next addr 
job=CCR_PROD_ZHBB line=rpc/fe.go:259
   ...
   [2026-05-14 09:33:52.149] WARN job sync failed, job: CCR_PROD_DW, err: 
[meta] index ids is empty
   ...
   [2026-05-14 09:33:53.597] INFO fullsync status: create snapshot with prefix 
ccrs_CCR_PROD_DW_1778668141 job=CCR_PROD_DW line=ccr/job.go:973
   [2026-05-14 09:33:53.694] INFO fullsync status: create snapshot 
ccrs_CCR_PROD_DW_1778668141_1778722433 job=CCR_PROD_DW line=ccr/job.go:1019
   [2026-05-14 09:33:53.694] INFO create snapshot 
PROD_DW.ccrs_CCR_PROD_DW_1778668141_1778722433, backup snapshot sql: BACKUP 
SNAPSHOT PROD_DW.ccrs_CCR_PROD_DW_1778668141_1778722433 TO __keep_on_local__ 
PROPERTIES ("type" = "full") job=CCR_PROD_DW line=base/spec.go:771
   
   ### What You Expected?
   
   CCR runs nomally after the Doris fe master node fails over to another node.
   
   ### How to Reproduce?
   
   When database-level CCR synchronization is running on the upstream cluster 
with continuous writes to a large number of tables, if the FE Master node goes 
down and a switchover occurs, CCR will trigger a fullsync again.
   
   ### Anything Else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [Bug] When a master node switchover occurs on the Doris FE, CCR restarts fullsync [doris]

Reply via email to