[ 
https://issues.apache.org/jira/browse/IOTDB-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553961#comment-17553961
 ] 

QiangShaowei commented on IOTDB-3452:
-------------------------------------

最新测试情况,偶现region迁移失败。
91节点要缩容,要迁移91上的所有region。 
1. ConfigNode收到region迁移失败,原因是给addPeer失败,
2022-06-14 15:13:00,353 [pool-2-IoTDB-ConfigNodeRPC-Client-13] DEBUG 
o.a.i.c.m.DataNodeRemoveService:534 - accept region: 
{color:red}TConsensusGroupId(type:DataRegion, id:5) {color}migrate result, 
result: 
TRegionMigrateResultReportReq(regionId:TConsensusGroupId(type:DataRegion, 
id:5), migrateResult:TSStatus(code:911, message:add peer: 
TEndPoint(ip:192.168.42.128, port:40010) failed. exception: 
Ratis request failed), failedNodeAndReason:{TDataNodeLocation(dataNodeId:0, 
externalEndPoint:TEndPoint(ip:192.168.42.128, port:6667), 
internalEndPoint:TEndPoint(ip:192.168.42.128, port:9003), 
dataBlockManagerEndPoint:TEndPoint(ip:192.168.42.128, port:8777), 
dataRegionConsensusEndPoint:TEndPoint(ip:192.168.42.128, port:40010), 
schemaRegionConsensusEndPoint:TEndPoint(ip:192.168.42.128, 
port:50010))={color:red}AddPeerFailed{color}})
2. 91上已经收到请求,开始执行 addPeer
2022-06-14 15:11:51,447 [pool-20-IoTDB-Region-Migrate-Pool-1] DEBUG 
o.a.i.d.s.RegionMigrateService$RegionMigrateTask:240 - start to add peer 
TDataNodeLocation(dataNodeId:0, externalEndPoint:TEndPoint(ip:192.168.42.128, 
port:6667), internalEndPoint:TEndPoint(ip:192.168.42.128, port:9003), 
dataBlockManagerEndPoint:TEndPoint(ip:192.168.42.128, port:8777), 
dataRegionConsensusEndPoint:TEndPoint(ip:192.168.42.128, port:40010), 
schemaRegionConsensusEndPoint:TEndPoint(ip:192.168.42.128, port:50010)) for 
region TConsensusGroupId(type:DataRegion, id:5)
sGroupId(type:DataRegion, id:5)。
{color:red}后续没有addPeer成功的日志。
{color}有异常如下
2022-06-14 15:11:51,461 [pool-20-IoTDB-Region-Migrate-Pool-1] DEBUG 
o.a.r.c.impl.BlockingImpl:139 - client-3C3E663D2B7C: receive 
RaftClientReply:client-3C3E663D2B7C->192.168.42.91_40010@group-000100000005, 
cid=8, FAILED org.apache.ratis.protocol.exceptions.NotLeaderException: Server 
192.168.42.91_40010@group-000100000005 is not the leader, logIndex=0, 
commits[192.168.42.91_40010:c8, 192.168.42.74_40010:c5]
2022-06-14 15:11:51,461 [pool-20-IoTDB-Region-Migrate-Pool-1] DEBUG 
o.a.r.c.i.RaftClientImpl:355 - client-3C3E663D2B7C: suggested new leader: null. 
Failed 
SetConfigurationRequest:client-3C3E663D2B7C->192.168.42.91_40010@group-000100000005,
 cid=8, seq=0, RW, null, 
peers:[192.168.42.74_40010|rpc:192.168.42.74:40010|admin:|client:|dataStream:|priority:1,
 
192.168.42.91_40010|rpc:192.168.42.91:40010|admin:|client:|dataStream:|priority:0,
 192.168.42.128_40010|rpc:192.168.42.128:40010|priority:0] with {}
org.apache.ratis.protocol.exceptions.NotLeaderException: Server 
192.168.42.91_40010@group-000100000005 is not the leader
        at 
org.apache.ratis.client.impl.ClientProtoUtils.toRaftClientReply(ClientProtoUtils.java:378)
        at 
org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:102)
        at 
org.apache.ratis.client.impl.BlockingImpl.sendRequest(BlockingImpl.java:132)
        at 
org.apache.ratis.client.impl.BlockingImpl.sendRequestWithRetry(BlockingImpl.java:98)
        at 
org.apache.ratis.client.impl.AdminImpl.setConfiguration(AdminImpl.java:47)
        at 
org.apache.ratis.client.api.AdminApi.setConfiguration(AdminApi.java:36)
        at 
org.apache.iotdb.consensus.ratis.RatisConsensus.sendReconfiguration(RatisConsensus.java:600)
        at 
org.apache.iotdb.consensus.ratis.RatisConsensus.addPeer(RatisConsensus.java:347)
        at 
org.apache.iotdb.db.service.RegionMigrateService$RegionMigrateTask.addNewPeer(RegionMigrateService.java:248)
        at 
org.apache.iotdb.db.service.RegionMigrateService$RegionMigrateTask.run(RegionMigrateService.java:159)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)



> DataNode缩容过程中,Region迁移时,ratis server退出
> --------------------------------------
>
>                 Key: IOTDB-3452
>                 URL: https://issues.apache.org/jira/browse/IOTDB-3452
>             Project: Apache IoTDB
>          Issue Type: Test
>            Reporter: QiangShaowei
>            Assignee: Song Ziyang
>            Priority: Major
>         Attachments: 128_log_all.rar, 74_log_all.rar, 91_log_all.rar, 
> confignode_log_all.rar
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to