[
https://issues.apache.org/jira/browse/IOTDB-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641047#comment-17641047
]
刘珍 commented on IOTDB-4830:
---------------------------
rel/1.0 1130_40de3ad
私有云3副本3C5D
1.启动3副本3C5D集群
2.stop ip3的datanode
3.BM写入数据,完成
4.缩容ip3的datanode,缩容成功。
查看ConfigNode Leader的日志:
2022-11-30 10:52:40,849 [ForkJoinPool.commonPool-worker-5] ERROR
o.a.i.c.p.TriggerInfo:246 - Failed to take snapshot, because snapshot file
[/data/iotdb/r_1130_40de3ad/sbin/../data/confignode/consensus/47474747-4747-4747-4747-000000000000/sm/.tmp.1_20583/trigger_info.bin]
is already exist.
2022-11-30 10:54:41,161 [ForkJoinPool.commonPool-worker-1] ERROR
o.a.i.c.p.TriggerInfo:246 - Failed to take snapshot, because snapshot file
[/data/iotdb/r_1130_40de3ad/sbin/../data/confignode/consensus/47474747-4747-4747-4747-000000000000/sm/.tmp.1_20583/trigger_info.bin]
is already exist.
2022-11-30 10:56:41,474 [ForkJoinPool.commonPool-worker-1] ERROR
o.a.i.c.p.TriggerInfo:246 - Failed to take snapshot, because snapshot file
[/data/iotdb/r_1130_40de3ad/sbin/../data/confignode/consensus/47474747-4747-4747-4747-000000000000/sm/.tmp.1_20583/trigger_info.bin]
is already exist.
2022-11-30 10:58:41,789 [ForkJoinPool.commonPool-worker-0] ERROR
o.a.i.c.p.TriggerInfo:246 - Failed to take snapshot, because snapshot file
[/data/iotdb/r_1130_40de3ad/sbin/../data/confignode/consensus/47474747-4747-4747-4747-000000000000/sm/.tmp.1_20583/trigger_info.bin]
is already exist.
2022-11-30 11:00:42,105 [ForkJoinPool.commonPool-worker-6] ERROR
o.a.i.c.p.TriggerInfo:246 - Failed to take snapshot, because snapshot file
[/data/iotdb/r_1130_40de3ad/sbin/../data/confignode/consensus/47474747-4747-4747-4747-000000000000/sm/.tmp.1_20583/trigger_info.bin]
is already exist.
2022-11-30 11:02:42,401 [ForkJoinPool.commonPool-worker-0] ERROR
o.a.i.c.p.TriggerInfo:246 - Failed to take snapshot, because snapshot file
[/data/iotdb/r_1130_40de3ad/sbin/../data/confignode/consensus/47474747-4747-4747-4747-000000000000/sm/.tmp.1_20583/trigger_info.bin]
is already exist.
2022-11-30 11:04:42,686 [0@group-000000000000-StateMachineUpdater] ERROR
o.a.i.c.p.TriggerInfo:246 - Failed to take snapshot, because snapshot file
[/data/iotdb/r_1130_40de3ad/sbin/../data/confignode/consensus/47474747-4747-4747-4747-000000000000/sm/.tmp.1_20583/trigger_info.bin]
is already exist.
2022-11-30 11:06:42,972 [ForkJoinPool.commonPool-worker-5] ERROR
o.a.i.c.p.TriggerInfo:246 - Failed to take snapshot, because snapshot file
[/data/iotdb/r_1130_40de3ad/sbin/../data/confignode/consensus/47474747-4747-4747-4747-000000000000/sm/.tmp.1_20583/trigger_info.bin]
is already exist.
2022-11-30 11:11:48,561 [ProcExecWorker-2] ERROR
o.a.i.c.c.s.SyncDataNodeClientPool:97 - {color:#DE350B}SET_SYSTEM_STATUS failed
on DataNode TEndPoint(ip:172.20.70.3, port:9003)
java.io.IOException: Borrow client from pool for node TEndPoint(ip:172.20.70.3,
port:9003) failed, you need to increase
dn_max_connection_for_internal_service.{color}
at
org.apache.iotdb.commons.client.ClientManager.borrowClient(ClientManager.java:64)
at
org.apache.iotdb.confignode.client.sync.SyncDataNodeClientPool.sendSyncRequestToDataNodeWithGivenRetry(SyncDataNodeClientPool.java:87)
at
org.apache.iotdb.confignode.procedure.env.ConfigNodeProcedureEnv.markDataNodeAsRemovingAndBroadcast(ConfigNodeProcedureEnv.java:373)
at
org.apache.iotdb.confignode.procedure.impl.node.RemoveDataNodeProcedure.executeFromState(RemoveDataNodeProcedure.java:86)
at
org.apache.iotdb.confignode.procedure.impl.node.RemoveDataNodeProcedure.executeFromState(RemoveDataNodeProcedure.java:47)
at
org.apache.iotdb.confignode.procedure.impl.statemachine.StateMachineProcedure.execute(StateMachineProcedure.java:186)
at
org.apache.iotdb.confignode.procedure.Procedure.doExecute(Procedure.java:365)
at
org.apache.iotdb.confignode.procedure.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:414)
at
org.apache.iotdb.confignode.procedure.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:373)
at
org.apache.iotdb.confignode.procedure.ProcedureExecutor.access$300(ProcedureExecutor.java:50)
at
org.apache.iotdb.confignode.procedure.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:741)
Caused by: net.sf.cglib.core.CodeGenerationException:
org.apache.thrift.transport.TTransportException-->java.net.ConnectException:
Connection refused (Connection refused)
at net.sf.cglib.core.ReflectUtils.newInstance(ReflectUtils.java:235)
at net.sf.cglib.core.ReflectUtils.newInstance(ReflectUtils.java:220)
at net.sf.cglib.proxy.Enhancer.createUsingReflection(Enhancer.java:639)
at net.sf.cglib.proxy.Enhancer.firstInstance(Enhancer.java:538)
at
net.sf.cglib.core.AbstractClassGenerator.create(AbstractClassGenerator.java:225)
at net.sf.cglib.proxy.Enhancer.createHelper(Enhancer.java:377)
at net.sf.cglib.proxy.Enhancer.create(Enhancer.java:304)
at
org.apache.iotdb.commons.client.sync.SyncThriftClientWithErrorHandler.newErrorHandler(SyncThriftClientWithErrorHandler.java:48)
at
org.apache.iotdb.commons.client.sync.SyncDataNodeInternalServiceClient$Factory.makeObject(SyncDataNodeInternalServiceClient.java:127)
at
org.apache.iotdb.commons.client.sync.SyncDataNodeInternalServiceClient$Factory.makeObject(SyncDataNodeInternalServiceClient.java:105)
at
org.apache.commons.pool2.impl.GenericKeyedObjectPool.create(GenericKeyedObjectPool.java:780)
at
org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:439)
at
org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:350)
at
org.apache.iotdb.commons.client.ClientManager.borrowClient(ClientManager.java:50)
... 10 common frames omitted
Caused by: org.apache.thrift.transport.TTransportException:
java.net.ConnectException: Connection refused (Connection refused)
at org.apache.thrift.transport.TSocket.open(TSocket.java:243)
at
org.apache.iotdb.rpc.TElasticFramedTransport.open(TElasticFramedTransport.java:91)
at
org.apache.iotdb.commons.client.sync.SyncDataNodeInternalServiceClient.<init>(SyncDataNodeInternalServiceClient.java:63)
at
org.apache.iotdb.commons.client.sync.SyncDataNodeInternalServiceClient$$EnhancerByCGLIB$$b73d1a05.<init>(<generated>)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at net.sf.cglib.core.ReflectUtils.newInstance(ReflectUtils.java:228)
... 23 common frames omitted
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.thrift.transport.TSocket.open(TSocket.java:238)
... 31 common frames omitted
> [SchemaRegion migrated failed] remove datanode that has stopped ,confignode
> executes “DELETE_OLD_REGION_PEER” on this datanode
> ------------------------------------------------------------------------------------------------------------------------------
>
> Key: IOTDB-4830
> URL: https://issues.apache.org/jira/browse/IOTDB-4830
> Project: Apache IoTDB
> Issue Type: Bug
> Components: mpp-cluster
> Affects Versions: 0.14.0-SNAPSHOT
> Reporter: 刘珍
> Assignee: 陈哲涵
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.14.0-SNAPSHOT
>
> Attachments: image-2022-11-02-14-55-28-013.png,
> image-2022-11-15-14-35-54-026.png, image-2022-11-15-14-37-38-147.png,
> image-2022-11-15-15-10-58-501.png, image-2022-11-15-15-12-05-884.png,
> iotdb_4830.conf, screenshot-1.png
>
>
> m_1102_09e2566
> 1. 启动3副本 , 3C5D集群
> 2.调用stop-datanode.sh脚本正常停止ip76的 datanode
> 3. benchmark写入数据完成
> 4. 缩容下线的ip76的datanode
> confignode 会重试连接ip76
> ,并且有DELETE_OLD_REGION_PEER重试操作,DELETE_OLD_REGION_PEER可以不执行,因为不是缩容开始后的重试 :
> 2022-11-02 14:34:23,637 [ProcExecWorker-9] ERROR
> o.a.i.c.c.s.SyncDataNodeClientPool:113 -
> {color:#DE350B}*DELETE_OLD_REGION_PEER*{color} failed on DataNode
> TEndPoint(ip:192.168.10.76, port:9003)
> 5. 启动 ip76 datanode , 可以看到remove开始在 ip76上执行 ,但此时此节点的状态却是Running, 应该是Removing。
> ip76 datanode log (已经在执行remove了):
> 2022-11-02 14:38:45,611 [pool-53-IoTDB-Region-Migrate-Pool-1] INFO
> o.a.i.d.s.RegionMigrateService$DeleteOldRegionPeerTask:493 - succeed to
> remove region DataRegion[12] consensus group
> 此时集群节点状态:
> !image-2022-11-02-14-55-28-013.png!
> TEST ENV
> 192.168.10.72~76
--
This message was sent by Atlassian Jira
(v8.20.10#820010)