[
https://issues.apache.org/jira/browse/IOTDB-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714494#comment-17714494
]
Jinrui Zhang commented on IOTDB-5063:
-------------------------------------
Suggest to re-test the scenario
> [ start datanode ] Failed to start Grpc server
> ----------------------------------------------
>
> Key: IOTDB-5063
> URL: https://issues.apache.org/jira/browse/IOTDB-5063
> Project: Apache IoTDB
> Issue Type: Bug
> Components: mpp-cluster
> Affects Versions: 0.14.0-SNAPSHOT
> Reporter: 刘珍
> Assignee: Jinrui Zhang
> Priority: Blocker
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png,
> screenshot-4.png, screenshot-5.png
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> master : 1127_4d7c15d
> 1. 启动3ConfigNode
> 2. 启动21DataNode,总是有1个datanode启动失败({color:#DE350B}复现3次{color}均能复现),报错信息有2种:
> 报错1 (出现2次):
> 2022-11-28 09:44:11,906 [main] ERROR o.a.ratis.util.ExitUtils:133 -
> Terminating with exit status 1: Failed to start Grpc server
> java.io.IOException: Failed to bind to address 0.0.0.0/0.0.0.0:50010
> at
> org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:328)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:183)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:92)
> at
> org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:266)
> at
> org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:270)
> at
> org.apache.ratis.server.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:72)
> at
> org.apache.ratis.server.impl.RaftServerProxy.startImpl(RaftServerProxy.java:394)
> at
> org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:270)
> at
> org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:387)
> at
> org.apache.iotdb.consensus.ratis.RatisConsensus.start(RatisConsensus.java:156)
> at org.apache.iotdb.db.service.DataNode.active(DataNode.java:319)
> at org.apache.iotdb.db.service.DataNode.doAddNode(DataNode.java:162)
> at
> org.apache.iotdb.db.service.DataNodeServerCommandLine.run(DataNodeServerCommandLine.java:95)
> at
> org.apache.iotdb.commons.ServerCommandLine.doMain(ServerCommandLine.java:58)
> at org.apache.iotdb.db.service.DataNode.main(DataNode.java:132)
> Caused by:
> org.apache.ratis.thirdparty.io.netty.channel.unix.Errors$NativeIoException:
> bind(..) failed: Address already in use
> 2022-11-28 09:44:11,910 [Thread-0] ERROR o.a.ratis.util.ExitUtils:133 -
> Terminating with exit status -1: Thread[Thread-0,5,main] has thrown an
> uncaught exception
> java.lang.NullPointerException: null
> at
> org.apache.iotdb.db.service.IoTDBShutdownHook.run(IoTDBShutdownHook.java:60)
> 查看这个节点的datanode进程的端口信息:
> !screenshot-2.png!
> 报错2(出现1次):
> !screenshot-3.png!
> 查看这个节点的datanode进程的端口信息:
> !screenshot-4.png!
> 启动成功的datanode的端口信息:
> !screenshot-5.png!
> 测试环境-私有云1期 , 8C32GB ,24台机器
> 1. ConfigNode配置
> MAX_HEAP_SIZE="20G"
> MAX_DIRECT_MEMORY_SIZE="6G"
> 2. DataNode配置
> MAX_HEAP_SIZE="20G"
> MAX_DIRECT_MEMORY_SIZE="6G"
> 3. Common配置
> schema_replication_factor=3
> data_replication_factor=3
> 4.启动3ConfigNode (ip23,24,25)
> 5.启动21DataNode ,启动脚本(21个Datanode的启动命令,间隔1秒)
> [root@i-66xazbht deploy_mpp_scripts]# cat 4_start_data_node.sh
> #!/bin/bash
> cluster_dir="/data/iotdb"
> cur_cluster="m_1127_4d7c15d"
> u_name="root"
> exec 3<datanode.txt
> while read line <&3
> do
> ssh ${u_name}@${line} "source
> /etc/profile;${cluster_dir}/${cur_cluster}/sbin/start-datanode.sh > /dev/null
> 2>&1 &"
> sleep 1
> done
> 6.查看集群信息,总是有1个datanode 是Unknown,去这个节点查看log
> !screenshot-1.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)