[ 
https://issues.apache.org/jira/browse/IOTDB-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17557783#comment-17557783
 ] 

QiangShaowei commented on IOTDB-3452:
-------------------------------------

昨天测试情况,发现在addPeer时失败
1. ConfigNode 2022-06-22 16:19:45发起region迁移请求,2022-06-22 16:20:54收到迁移失败,原因是add 
peer 失败。

{color:red}2022-06-22 16:19:45,001{color} [Exec-RemoveDataNode-Thread] DEBUG 
o.a.i.c.m.DataNodeRemoveManager:449 - send region 
TConsensusGroupId(type:DataRegion, id:11) migrate action to 
TDataNodeLocation(dataNodeId:2, externalEndPoint:TEndPoint(ip:192.168.42.196, 
port:6697), internalEndPoint:TEndPoint(ip:192.168.42.196, port:9009), 
dataBlockManagerEndPoint:TEndPoint(ip:192.168.42.196, port:8797), 
dataRegionConsensusEndPoint:TEndPoint(ip:192.168.42.196, port:40090), 
schemaRegionConsensusEndPoint:TEndPoint(ip:192.168.42.196, port:50090)), wait 
it finished

{color:red}2022-06-22 16:20:54,065 
[{color}pool-2-IoTDB-ConfigNodeRPC-Client-16] DEBUG 
o.a.i.c.m.DataNodeRemoveManager:498 - accept region 
TConsensusGroupId(type:DataRegion, id:11) migrate result, result: 
TRegionMigrateResultReportReq(regionId:TConsensusGroupId(type:DataRegion, 
id:11), migrateResult:TSStatus(code:911, message:add peer: 
TEndPoint(ip:192.168.42.16, port:40090) for region: DataRegion[11] failed. 
exception: Ratis request failed), 
failedNodeAndReason:{TDataNodeLocation(dataNodeId:1, 
externalEndPoint:TEndPoint(ip:192.168.42.16, port:6697), 
internalEndPoint:TEndPoint(ip:192.168.42.16, port:9009), 
dataBlockManagerEndPoint:TEndPoint(ip:192.168.42.16, port:8797), 
dataRegionConsensusEndPoint:TEndPoint(ip:192.168.42.16, port:40090), 
schemaRegionConsensusEndPoint:TEndPoint(ip:192.168.42.16, 
port:50090))={color:red}AddPeerFailed{color}})

2.要缩容的节点196上,执行region 11 的 addPeer时的jstack如下,会执行setConfiguration方法
"pool-20-IoTDB-Region-Migrate-Pool-1"  prio=5 os_prio=0 tid=0x0000fffe2403f800 
nid=0x59f9 waiting on condition [0x0000fffd6b9f6000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at java.lang.Thread.sleep(Thread.java:342)
at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:386)
at org.apache.ratis.util.TimeDuration.sleep(TimeDuration.java:325)
at org.apache.ratis.util.TimeDuration.sleep(TimeDuration.java:310)
at 
org.apache.ratis.client.impl.BlockingImpl.sendRequestWithRetry(BlockingImpl.java:121)
at org.apache.ratis.client.impl.AdminImpl.setConfiguration(AdminImpl.java:47)
at org.apache.ratis.client.api.AdminApi.setConfiguration(AdminApi.java:36)
at 
org.apache.iotdb.consensus.ratis.RatisConsensus.sendReconfiguration(RatisConsensus.java:621)
at 
org.apache.iotdb.consensus.ratis.RatisConsensus.addPeer(RatisConsensus.java:348)
at 
org.apache.iotdb.db.service.RegionMigrateService$RegionMigrateTask.addPeer(RegionMigrateService.java:259)
at 
org.apache.iotdb.db.service.RegionMigrateService$RegionMigrateTask.run(RegionMigrateService.java:160)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
 缩容196节点16:20:54报错日志如下
2022-06-22 16:20:54,050 [grpc-default-worker-ELG-3-12] DEBUG 
o.a.r.t.i.n.u.i.l.AbstractInternalLogger:214 - [id: 0x262750fe, 
L:/192.168.42.196:39172 - R:192.168.42.204/192.168.42.204:40090] INBOUND 
HEADERS: streamId=3 headers=GrpcHttp2ResponseHeaders[:status: 200, 
content-type: application/grpc, exception-type: 
org.apache.ratis.protocol.exceptions.ReconfigurationTimeoutException, 
exception-object-bin: 
rO0ABXNyAERvcmcuYXBhY2hlLnJhdGlzLnByb3RvY29sLmV4Y2VwdGlvbnMuUmVjb25maWd1cmF0aW9uVGltZW91dEV4Y2VwdGlvbmw90jhgLZ4SAgAAeHIAMm9yZy5hcGFjaGUucmF0aXMucHJvdG9jb2wuZXhjZXB0aW9ucy5SYWZ0RXhjZXB0aW9uAAAAAAAAAAECAAB4cgATamF2YS5pby5JT0V4Y2VwdGlvbmyAc2RlJfCrAgAAeHIAE2phdmEubGFuZy5FeGNlcHRpb27Q/R8+GjscxAIAAHhyABNqYXZhLmxhbmcuVGhyb3dhYmxl1cY1Jzl3uMsDAARMAAVjYXVzZXQAFUxqYXZhL2xhbmcvVGhyb3dhYmxlO0wADWRldGFpbE1lc3NhZ2V0ABJMamF2YS9sYW5nL1N0cmluZztbAApzdGFja1RyYWNldAAeW0xqYXZhL2xhbmcvU3RhY2tUcmFjZUVsZW1lbnQ7TAAUc3VwcHJlc3NlZEV4Y2VwdGlvbnN0ABBMamF2YS91dGlsL0xpc3Q7eHBxAH4ACXQBbDE5Mi4xNjguNDIuMjA0XzQwMDkwQGdyb3VwLTAwMDEwMDAwMDAwQi1Db25maWd1cmF0aW9uU3RhZ2luZ1N0YXRlOiBGYWlsIHRvIHNldCBjb25maWd1cmF0aW9uIFsxOTIuMTY4LjQyLjE2XzQwMDkwfHJwYzoxOTIuMTY4LjQyLjE2OjQwMDkwfGFkbWluOnxjbGllbnQ6fGRhdGFTdHJlYW06fHByaW9yaXR5OjAsIDE5Mi4xNjguNDIuMTk2XzQwMDkwfHJwYzoxOTIuMTY4LjQyLjE5Njo0MDA5MHxhZG1pbjp8Y2xpZW50OnxkYXRhU3RyZWFtOnxwcmlvcml0eTowLCAxOTIuMTY4LjQyLjIwNF80MDA5MHxycGM6MTkyLjE2OC40Mi4yMDQ6NDAwOTB8YWRtaW46fGNsaWVudDp8ZGF0YVN0cmVhbTp8cHJpb3JpdHk6MV0gZHVlIHRvIE5PUFJPR1JFU1N1cgAeW0xqYXZhLmxhbmcuU3RhY2tUcmFjZUVsZW1lbnQ7AkYqPDz9IjkCAAB4cAAAAARzcgAbamF2YS5sYW5nLlN0YWNrVHJhY2VFbGVtZW50YQnFmiY23YUCAARJAApsaW5lTnVtYmVyTAAOZGVjbGFyaW5nQ2xhc3NxAH4ABkwACGZpbGVOYW1lcQB+AAZMAAptZXRob2ROYW1lcQB+AAZ4cAAABEJ0AEZvcmcuYXBhY2hlLnJhdGlzLnNlcnZlci5pbXBsLkxlYWRlclN0YXRlSW1wbCRDb25maWd1cmF0aW9uU3RhZ2luZ1N0YXRldAAUTGVhZGVyU3RhdGVJbXBsLmphdmF0AARmYWlsc3EAfgANAAACwHQALG9yZy5hcGFjaGUucmF0aXMuc2VydmVyLmltcGwuTGVhZGVyU3RhdGVJbXBscQB+ABB0AAxjaGVja1N0YWdpbmdzcQB+AA0AAABfcQB+ABNxAH4AEHQACmFjY2VzcyQ1MDBzcQB+AA0AAAJ8dAA7b3JnLmFwYWNoZS5yYXRpcy5zZXJ2ZXIuaW1wbC5MZWFkZXJTdGF0ZUltcGwkRXZlbnRQcm9jZXNzb3JxAH4AEHQAA3J1bnNyACZqYXZhLnV0aWwuQ29sbGVjdGlvbnMkVW5tb2RpZmlhYmxlTGlzdPwPJTG17I4QAgABTAAEbGlzdHEAfgAIeHIALGphdmEudXRpbC5Db2xsZWN0aW9ucyRVbm1vZGlmaWFibGVDb2xsZWN0aW9uGUIAgMte9x4CAAFMAAFjdAAWTGphdmEvdXRpbC9Db2xsZWN0aW9uO3hwc3IAE2phdmEudXRpbC5BcnJheUxpc3R4gdIdmcdhnQMAAUkABHNpemV4cAAAAAB3BAAAAAB4cQB+AB94,
 grpc-status: 13, grpc-message: 
192.168.42.204_40090@group-00010000000B-ConfigurationStagingState: 
{color:red}Fail to set configuration 
{color}[192.168.42.16_40090|rpc:192.168.42.16:40090|admin:|client:|dataStream:|priority:0,
 
192.168.42.196_40090|rpc:192.168.42.196:40090|admin:|client:|dataStream:|priority:0,
 
192.168.42.204_40090|rpc:192.168.42.204:40090|admin:|client:|dataStream:|priority:1]
 due to NOPROGRESS] padding=0 endStream=true
2022-06-22 16:20:54,054 [pool-20-IoTDB-Region-Migrate-Pool-1] DEBUG 
o.a.r.c.i.RaftClientImpl:355 - client-7BC1E2CE1983: suggested new leader: null. 
Failed 
SetConfigurationRequest:client-7BC1E2CE1983->192.168.42.204_40090@group-00010000000B,
 cid=19, seq=0, RW, null, 
peers:[192.168.42.16_40090|rpc:192.168.42.16:40090|priority:0, 
192.168.42.196_40090|rpc:192.168.42.196:40090|admin:|client:|dataStream:|priority:0,
 
192.168.42.204_40090|rpc:192.168.42.204:40090|admin:|client:|dataStream:|priority:1]
 with {}
org.apache.ratis.protocol.exceptions.ReconfigurationTimeoutException: 
192.168.42.204_40090@group-00010000000B-ConfigurationStagingState: Fail to set 
configuration 
[192.168.42.16_40090|rpc:192.168.42.16:40090|admin:|client:|dataStream:|priority:0,
 
192.168.42.196_40090|rpc:192.168.42.196:40090|admin:|client:|dataStream:|priority:0,
 
192.168.42.204_40090|rpc:192.168.42.204:40090|admin:|client:|dataStream:|priority:1]
 due to NOPROGRESS
at 
org.apache.ratis.server.impl.LeaderStateImpl$ConfigurationStagingState.fail(LeaderStateImpl.java:1090)
at 
org.apache.ratis.server.impl.LeaderStateImpl.checkStaging(LeaderStateImpl.java:704)
at 
org.apache.ratis.server.impl.LeaderStateImpl.access$500(LeaderStateImpl.java:95)
at 
org.apache.ratis.server.impl.LeaderStateImpl$EventProcessor.run(LeaderStateImpl.java:636)
3. 看204节点日志16:20:54
2022-06-22 16:20:54,044 
[192.168.42.204_40090@group-00010000000B-LeaderStateImpl] DEBUG 
o.a.r.s.i.LeaderStateImpl$ConfigurationStagingState:1085 - 
192.168.42.204_40090@group-00010000000B-ConfigurationStagingState: Fail to set 
configuration 
[192.168.42.16_40090|rpc:192.168.42.16:40090|admin:|client:|dataStream:|priority:0,
 
192.168.42.196_40090|rpc:192.168.42.196:40090|admin:|client:|dataStream:|priority:0,
 
192.168.42.204_40090|rpc:192.168.42.204:40090|admin:|client:|dataStream:|priority:1]
 due to NOPROGRESS
2022-06-22 16:20:54,044 
[192.168.42.204_40090@group-00010000000B-LeaderStateImpl] DEBUG 
o.a.r.m.RatisMetrics:46 - Unregistering Metrics Registry : 
ratis_grpc.log_appender.192.168.42.204_40090@group-00010000000B
2022-06-22 16:20:54,044 
[192.168.42.204_40090@group-00010000000B-LeaderStateImpl] DEBUG 
o.a.r.u.LifeCycle$State:116 - 
192.168.42.204_40090@group-00010000000B->192.168.42.16_40090-GrpcLogAppender-LogAppenderDaemon:
 RUNNING -> CLOSING
2022-06-22 16:20:54,045 
[192.168.42.204_40090@group-00010000000B->192.168.42.16_40090-GrpcLogAppender-LogAppenderDaemon]
 DEBUG o.a.r.u.LifeCycle$State:116 - 
192.168.42.204_40090@group-00010000000B->192.168.42.16_40090-GrpcLogAppender-LogAppenderDaemon:
 CLOSING -> CLOSED
2022-06-22 16:20:54,047 [grpc-default-worker-ELG-3-13] DEBUG 
o.a.r.t.i.n.u.i.l.AbstractInternalLogger:214 - [id: 0x38e2240f, 
L:/192.168.42.204:40090 - R:/192.168.42.196:39172] OUTBOUND HEADERS: streamId=3 
headers=GrpcHttp2OutboundHeaders[:status: 200, content-type: application/grpc, 
exception-type: 
org.apache.ratis.protocol.exceptions.ReconfigurationTimeoutException, 
exception-object-bin: 
rO0ABXNyAERvcmcuYXBhY2hlLnJhdGlzLnByb3RvY29sLmV4Y2VwdGlvbnMuUmVjb25maWd1cmF0aW9uVGltZW91dEV4Y2VwdGlvbmw90jhgLZ4SAgAAeHIAMm9yZy5hcGFjaGUucmF0aXMucHJvdG9jb2wuZXhjZXB0aW9ucy5SYWZ0RXhjZXB0aW9uAAAAAAAAAAECAAB4cgATamF2YS5pby5JT0V4Y2VwdGlvbmyAc2RlJfCrAgAAeHIAE2phdmEubGFuZy5FeGNlcHRpb27Q/R8+GjscxAIAAHhyABNqYXZhLmxhbmcuVGhyb3dhYmxl1cY1Jzl3uMsDAARMAAVjYXVzZXQAFUxqYXZhL2xhbmcvVGhyb3dhYmxlO0wADWRldGFpbE1lc3NhZ2V0ABJMamF2YS9sYW5nL1N0cmluZztbAApzdGFja1RyYWNldAAeW0xqYXZhL2xhbmcvU3RhY2tUcmFjZUVsZW1lbnQ7TAAUc3VwcHJlc3NlZEV4Y2VwdGlvbnN0ABBMamF2YS91dGlsL0xpc3Q7eHBxAH4ACXQBbDE5Mi4xNjguNDIuMjA0XzQwMDkwQGdyb3VwLTAwMDEwMDAwMDAwQi1Db25maWd1cmF0aW9uU3RhZ2luZ1N0YXRlOiBGYWlsIHRvIHNldCBjb25maWd1cmF0aW9uIFsxOTIuMTY4LjQyLjE2XzQwMDkwfHJwYzoxOTIuMTY4LjQyLjE2OjQwMDkwfGFkbWluOnxjbGllbnQ6fGRhdGFTdHJlYW06fHByaW9yaXR5OjAsIDE5Mi4xNjguNDIuMTk2XzQwMDkwfHJwYzoxOTIuMTY4LjQyLjE5Njo0MDA5MHxhZG1pbjp8Y2xpZW50OnxkYXRhU3RyZWFtOnxwcmlvcml0eTowLCAxOTIuMTY4LjQyLjIwNF80MDA5MHxycGM6MTkyLjE2OC40Mi4yMDQ6NDAwOTB8YWRtaW46fGNsaWVudDp8ZGF0YVN0cmVhbTp8cHJpb3JpdHk6MV0gZHVlIHRvIE5PUFJPR1JFU1N1cgAeW0xqYXZhLmxhbmcuU3RhY2tUcmFjZUVsZW1lbnQ7AkYqPDz9IjkCAAB4cAAAAARzcgAbamF2YS5sYW5nLlN0YWNrVHJhY2VFbGVtZW50YQnFmiY23YUCAARJAApsaW5lTnVtYmVyTAAOZGVjbGFyaW5nQ2xhc3NxAH4ABkwACGZpbGVOYW1lcQB+AAZMAAptZXRob2ROYW1lcQB+AAZ4cAAABEJ0AEZvcmcuYXBhY2hlLnJhdGlzLnNlcnZlci5pbXBsLkxlYWRlclN0YXRlSW1wbCRDb25maWd1cmF0aW9uU3RhZ2luZ1N0YXRldAAUTGVhZGVyU3RhdGVJbXBsLmphdmF0AARmYWlsc3EAfgANAAACwHQALG9yZy5hcGFjaGUucmF0aXMuc2VydmVyLmltcGwuTGVhZGVyU3RhdGVJbXBscQB+ABB0AAxjaGVja1N0YWdpbmdzcQB+AA0AAABfcQB+ABNxAH4AEHQACmFjY2VzcyQ1MDBzcQB+AA0AAAJ8dAA7b3JnLmFwYWNoZS5yYXRpcy5zZXJ2ZXIuaW1wbC5MZWFkZXJTdGF0ZUltcGwkRXZlbnRQcm9jZXNzb3JxAH4AEHQAA3J1bnNyACZqYXZhLnV0aWwuQ29sbGVjdGlvbnMkVW5tb2RpZmlhYmxlTGlzdPwPJTG17I4QAgABTAAEbGlzdHEAfgAIeHIALGphdmEudXRpbC5Db2xsZWN0aW9ucyRVbm1vZGlmaWFibGVDb2xsZWN0aW9uGUIAgMte9x4CAAFMAAFjdAAWTGphdmEvdXRpbC9Db2xsZWN0aW9uO3hwc3IAE2phdmEudXRpbC5BcnJheUxpc3R4gdIdmcdhnQMAAUkABHNpemV4cAAAAAB3BAAAAAB4cQB+AB94,
 grpc-status: 13, grpc-message: 
192.168.42.204_40090@group-00010000000B-ConfigurationStagingState: Fail to set 
configuration 
[192.168.42.16_40090|rpc:192.168.42.16:40090|admin:|client:|dataStream:|priority:0,
 
192.168.42.196_40090|rpc:192.168.42.196:40090|admin:|client:|dataStream:|priority:0,
 
192.168.42.204_40090|rpc:192.168.42.204:40090|admin:|client:|dataStream:|priority:1]
 due to NOPROGRESS] padding=0 endStream=true
*然后过了约一分钟,报之前的Broken pipe。 由于add peer在16:20:54就失败了,所以不是后面的Broken pipe直接导致的
*2022-06-22 16:21:36,824 [ForkJoinPool.commonPool-worker-9] DEBUG 
o.a.r.g.s.GrpcServerProtocolService$ServerRequestStreamObserver:102 - 
192.168.42.204_40090: reply 
192.168.42.196_40090<-192.168.42.204_40090#250:OK-t1,SUCCESS,nextIndex=1,followerCommit=0
2022-06-22 16:21:36,824 [grpc-default-worker-ELG-3-3] DEBUG 
o.a.r.t.i.n.u.i.l.AbstractInternalLogger:214 - [id: 0x9d090570, 
L:/192.168.42.204:40090 - R:/192.168.42.196:38776] OUTBOUND DATA: streamId=13 
padding=0 endStream=false length=93 
bytes=00000000580a450a143139322e3136382e34322e3139365f343030393012143139322e3136382e34322e3230345f34303039301a120a10474747474747474747...
2022-06-22 16:21:36,857 [grpc-default-worker-ELG-3-2] DEBUG 
o.a.r.t.i.n.u.i.l.AbstractInternalLogger:214 - [id: 0x154a1b7a, 
L:/192.168.42.204:52562 - R:192.168.42.16/192.168.42.16:50090] OUTBOUND 
GO_AWAY: lastStreamId=2147483647 errorCode=2 length=48 
bytes=7265616441646472657373282e2e29206661696c65643a20436f6e6e656374696f6e2072657365742062792070656572
2022-06-22 16:21:36,859 [grpc-default-worker-ELG-3-2] DEBUG 
o.a.r.t.i.n.h.c.h.Http2ConnectionHandler:928 - [id: 0x154a1b7a, 
L:/192.168.42.204:52562 ! R:192.168.42.16/192.168.42.16:50090] Sending GOAWAY 
failed: lastStreamId '2147483647', errorCode '2', debugData 'readAddress(..) 
failed: Connection reset by peer'. Forcing shutdown of the connection.
org.apache.ratis.thirdparty.io.netty.channel.unix.Errors$NativeIoException: 
writevAddresses(..) failed: Broken pipe

> DataNode缩容过程中,Region迁移时,ratis server退出
> --------------------------------------
>
>                 Key: IOTDB-3452
>                 URL: https://issues.apache.org/jira/browse/IOTDB-3452
>             Project: Apache IoTDB
>          Issue Type: Test
>            Reporter: QiangShaowei
>            Assignee: Song Ziyang
>            Priority: Major
>         Attachments: 128_log_all.rar, 128_log_debug.rar, 16$196日志.rar, 
> 74_log_all.rar, 74_log_debug.rar, 91_log_all.rar, 91_log_debug.rar, 
> config&204日志.rar, confignode--log_debug.rar, confignode_log_all.rar
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to