[
https://issues.apache.org/jira/browse/IOTDB-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583494#comment-17583494
]
刘珍 commented on IOTDB-3965:
---------------------------
0819 子阳给的安装包,没有md5的报错,但是缩容不掉:
缩容ip4缩容返回
2022-08-23 17:49:11,000 [main] INFO o.a.i.d.s.DataNodeServerCommandLine:160 -
Remove result TDataNodeRemoveResp(status:TSStatus(code:200, message:Server
accept the request))
ip5:
2022-08-23 17:49:20,259
[192.168.130.5_40010@group-000100000002-SegmentedRaftLogWorker] ERROR
o.a.r.s.r.s.SegmentedRaftLogWorker:345 -
192.168.130.5_40010@group-000100000002-SegmentedRaftLogWorker hit exception
java.lang.IllegalStateException:
192.168.130.5_40010@group-000100000002-SegmentedRaftLogWorker: File
/data2/cluster_test/remove_datanode_0819/datanode/sbin/../data/consensus/data_region/47474747-4747-4747-4747-000100000002/current/log_inprogress_331354
to be rolled does not exist
at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:72)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$FinalizeLogSegment.execute(SegmentedRaftLogWorker.java:578)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:310)
at java.lang.Thread.run(Thread.java:748)
2022-08-23 17:50:29,558 [grpc-default-executor-6] ERROR
o.a.r.s.i.SnapshotInstallationHandler:96 -
192.168.130.5_40010@group-000100000002: installSnapshot failed
org.apache.ratis.protocol.exceptions.ServerNotReadyException:
192.168.130.5_40010@group-000100000002 is not in [STARTING, RUNNING]: current
state is CLOSING
at
org.apache.ratis.server.impl.RaftServerImpl.lambda$assertLifeCycleState$9(RaftServerImpl.java:713)
at
org.apache.ratis.util.LifeCycle.assertCurrentState(LifeCycle.java:253)
at
org.apache.ratis.server.impl.RaftServerImpl.assertLifeCycleState(RaftServerImpl.java:712)
at
org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshotImpl(SnapshotInstallationHandler.java:112)
at
org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshot(SnapshotInstallationHandler.java:94)
at
org.apache.ratis.server.impl.RaftServerImpl.installSnapshot(RaftServerImpl.java:1494)
at
org.apache.ratis.server.impl.RaftServerProxy.installSnapshot(RaftServerProxy.java:634)
at
org.apache.ratis.grpc.server.GrpcServerProtocolService$2.process(GrpcServerProtocolService.java:244)
at
org.apache.ratis.grpc.server.GrpcServerProtocolService$2.process(GrpcServerProtocolService.java:241)
at
org.apache.ratis.grpc.server.GrpcServerProtocolService$ServerRequestStreamObserver.onNext(GrpcServerProtocolService.java:126)
at
org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:262)
at
org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
at
org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:332)
at
org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:315)
at
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:834)
at
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
> [ratis md5 ] ERROR o.a.i.c.r.FileInfoWithDelayedMd5Computing:64 - compute
> file digest for .tsfile.resource.md5.tmp failed due to {}
> -----------------------------------------------------------------------------------------------------------------------------------
>
> Key: IOTDB-3965
> URL: https://issues.apache.org/jira/browse/IOTDB-3965
> Project: Apache IoTDB
> Issue Type: Bug
> Components: mpp-cluster
> Affects Versions: 0.14.0-SNAPSHOT
> Reporter: 刘珍
> Assignee: Song Ziyang
> Priority: Major
> Attachments: ip3_config.properties, ip4_config.properties,
> ip5_config.properties
>
>
> RatisConsensus 3副本3C5D
> 缩容1个datanode(包含2个dataregion),缩容节点的datanode 报错,add new peer 失败:
> 2022-07-26 15:05:47,271 [ForkJoinPool.commonPool-worker-13] ERROR
> o.a.i.c.r.FileInfoWithDelayedMd5Computing:64 - compute file digest for
> /data2/cluster_test/shrink_0726_ratis/datanode/sbin/../data/consensus/data_region/47474747-4747-4747-4747-000100000004/sm/1_400002/sequence/root.test.g0_0/4/0/1658816468388-9-0-0.tsfile.md5.tmp
> failed due to {}
> java.io.FileNotFoundException:
> /data2/cluster_test/shrink_0726_ratis/datanode/sbin/../data/consensus/data_region/47474747-4747-4747-4747-000100000004/sm/1_400002/sequence/root.test.g0_0/4/0/1658816468388-9-0-0.tsfile.md5.tmp
> (No such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.<init>(FileInputStream.java:138)
> at
> org.apache.ratis.util.MD5FileUtil.computeMd5ForFile(MD5FileUtil.java:125)
> at
> org.apache.iotdb.consensus.ratis.FileInfoWithDelayedMd5Computing.lambda$new$0(FileInfoWithDelayedMd5Computing.java:61)
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1596)
> at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> at
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1067)
> at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1703)
> at
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:172)
> 2022-07-26 14:54:48,514 [pool-16-IoTDB-Region-Migrate-Pool-1] ERROR
> o.a.i.d.s.RegionMigrateService$RegionMigrateTask:348 - add new peer
> TEndPoint(ip:192.168.130.4, port:40010) for region DataRegion[5] failed,
> resp: ConsensusGenericResponse{success=false}
> exception=org.apache.iotdb.consensus.exception.RatisRequestFailedException:
> Ratis request failed
> 复现流程:
> 1. 192.168.130.1/2/3/4/5
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> schema_replication_factor=3
> data_replication_factor=3
> confignode:
> MAX_HEAP_SIZE="4G"
> datanode:
> MAX_HEAP_SIZE="16G"
> max_waiting_time_when_insert_blocked=3600000
> query_timeout_threshold=36000000
> 2. benchmark 在ip2
> 配置文件见附件
> 3. 缩容ip5 ,包含2个dataregion
> ./sbin/remove-datanode.sh 192.168.130.5:6667
> ip5有1个dataregion做了1次snapshot
> 有1个dataregion leader 做了1次snapshot,ip5同步慢,还没有发生snapshot
> 查看ip5的log。
--
This message was sent by Atlassian Jira
(v8.20.10#820010)