[
https://issues.apache.org/jira/browse/IOTDB-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553961#comment-17553961
]
QiangShaowei commented on IOTDB-3452:
-------------------------------------
最新测试情况,偶现region迁移失败。
91节点要缩容,要迁移91上的所有region。
1. ConfigNode收到region迁移失败,原因是给addPeer失败,
2022-06-14 15:13:00,353 [pool-2-IoTDB-ConfigNodeRPC-Client-13] DEBUG
o.a.i.c.m.DataNodeRemoveService:534 - accept region:
{color:red}TConsensusGroupId(type:DataRegion, id:5) {color}migrate result,
result:
TRegionMigrateResultReportReq(regionId:TConsensusGroupId(type:DataRegion,
id:5), migrateResult:TSStatus(code:911, message:add peer:
TEndPoint(ip:192.168.42.128, port:40010) failed. exception:
Ratis request failed), failedNodeAndReason:{TDataNodeLocation(dataNodeId:0,
externalEndPoint:TEndPoint(ip:192.168.42.128, port:6667),
internalEndPoint:TEndPoint(ip:192.168.42.128, port:9003),
dataBlockManagerEndPoint:TEndPoint(ip:192.168.42.128, port:8777),
dataRegionConsensusEndPoint:TEndPoint(ip:192.168.42.128, port:40010),
schemaRegionConsensusEndPoint:TEndPoint(ip:192.168.42.128,
port:50010))={color:red}AddPeerFailed{color}})
2. 91上已经收到请求,开始执行 addPeer
2022-06-14 15:11:51,447 [pool-20-IoTDB-Region-Migrate-Pool-1] DEBUG
o.a.i.d.s.RegionMigrateService$RegionMigrateTask:240 - start to add peer
TDataNodeLocation(dataNodeId:0, externalEndPoint:TEndPoint(ip:192.168.42.128,
port:6667), internalEndPoint:TEndPoint(ip:192.168.42.128, port:9003),
dataBlockManagerEndPoint:TEndPoint(ip:192.168.42.128, port:8777),
dataRegionConsensusEndPoint:TEndPoint(ip:192.168.42.128, port:40010),
schemaRegionConsensusEndPoint:TEndPoint(ip:192.168.42.128, port:50010)) for
region TConsensusGroupId(type:DataRegion, id:5)
sGroupId(type:DataRegion, id:5)。
{color:red}后续没有addPeer成功的日志。
{color}有异常如下
2022-06-14 15:11:51,461 [pool-20-IoTDB-Region-Migrate-Pool-1] DEBUG
o.a.r.c.impl.BlockingImpl:139 - client-3C3E663D2B7C: receive
RaftClientReply:client-3C3E663D2B7C->192.168.42.91_40010@group-000100000005,
cid=8, FAILED org.apache.ratis.protocol.exceptions.NotLeaderException: Server
192.168.42.91_40010@group-000100000005 is not the leader, logIndex=0,
commits[192.168.42.91_40010:c8, 192.168.42.74_40010:c5]
2022-06-14 15:11:51,461 [pool-20-IoTDB-Region-Migrate-Pool-1] DEBUG
o.a.r.c.i.RaftClientImpl:355 - client-3C3E663D2B7C: suggested new leader: null.
Failed
SetConfigurationRequest:client-3C3E663D2B7C->192.168.42.91_40010@group-000100000005,
cid=8, seq=0, RW, null,
peers:[192.168.42.74_40010|rpc:192.168.42.74:40010|admin:|client:|dataStream:|priority:1,
192.168.42.91_40010|rpc:192.168.42.91:40010|admin:|client:|dataStream:|priority:0,
192.168.42.128_40010|rpc:192.168.42.128:40010|priority:0] with {}
org.apache.ratis.protocol.exceptions.NotLeaderException: Server
192.168.42.91_40010@group-000100000005 is not the leader
at
org.apache.ratis.client.impl.ClientProtoUtils.toRaftClientReply(ClientProtoUtils.java:378)
at
org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:102)
at
org.apache.ratis.client.impl.BlockingImpl.sendRequest(BlockingImpl.java:132)
at
org.apache.ratis.client.impl.BlockingImpl.sendRequestWithRetry(BlockingImpl.java:98)
at
org.apache.ratis.client.impl.AdminImpl.setConfiguration(AdminImpl.java:47)
at
org.apache.ratis.client.api.AdminApi.setConfiguration(AdminApi.java:36)
at
org.apache.iotdb.consensus.ratis.RatisConsensus.sendReconfiguration(RatisConsensus.java:600)
at
org.apache.iotdb.consensus.ratis.RatisConsensus.addPeer(RatisConsensus.java:347)
at
org.apache.iotdb.db.service.RegionMigrateService$RegionMigrateTask.addNewPeer(RegionMigrateService.java:248)
at
org.apache.iotdb.db.service.RegionMigrateService$RegionMigrateTask.run(RegionMigrateService.java:159)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
> DataNode缩容过程中,Region迁移时,ratis server退出
> --------------------------------------
>
> Key: IOTDB-3452
> URL: https://issues.apache.org/jira/browse/IOTDB-3452
> Project: Apache IoTDB
> Issue Type: Test
> Reporter: QiangShaowei
> Assignee: Song Ziyang
> Priority: Major
> Attachments: 128_log_all.rar, 74_log_all.rar, 91_log_all.rar,
> confignode_log_all.rar
>
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)