[ 
https://issues.apache.org/jira/browse/IOTDB-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714494#comment-17714494
 ] 

Jinrui Zhang commented on IOTDB-5063:
-------------------------------------

Suggest to re-test the scenario

> [ start datanode ] Failed to start Grpc server
> ----------------------------------------------
>
>                 Key: IOTDB-5063
>                 URL: https://issues.apache.org/jira/browse/IOTDB-5063
>             Project: Apache IoTDB
>          Issue Type: Bug
>          Components: mpp-cluster
>    Affects Versions: 0.14.0-SNAPSHOT
>            Reporter: 刘珍
>            Assignee: Jinrui Zhang
>            Priority: Blocker
>         Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png, screenshot-5.png
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> master : 1127_4d7c15d
> 1. 启动3ConfigNode
> 2. 启动21DataNode,总是有1个datanode启动失败({color:#DE350B}复现3次{color}均能复现),报错信息有2种:
> 报错1 (出现2次):
> 2022-11-28 09:44:11,906 [main] ERROR o.a.ratis.util.ExitUtils:133 - 
> Terminating with exit status 1: Failed to start Grpc server
> java.io.IOException: Failed to bind to address 0.0.0.0/0.0.0.0:50010
>         at 
> org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:328)
>         at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:183)
>         at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:92)
>         at 
> org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:266)
>         at 
> org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:270)
>         at 
> org.apache.ratis.server.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:72)
>         at 
> org.apache.ratis.server.impl.RaftServerProxy.startImpl(RaftServerProxy.java:394)
>         at 
> org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:270)
>         at 
> org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:387)
>         at 
> org.apache.iotdb.consensus.ratis.RatisConsensus.start(RatisConsensus.java:156)
>         at org.apache.iotdb.db.service.DataNode.active(DataNode.java:319)
>         at org.apache.iotdb.db.service.DataNode.doAddNode(DataNode.java:162)
>         at 
> org.apache.iotdb.db.service.DataNodeServerCommandLine.run(DataNodeServerCommandLine.java:95)
>         at 
> org.apache.iotdb.commons.ServerCommandLine.doMain(ServerCommandLine.java:58)
>         at org.apache.iotdb.db.service.DataNode.main(DataNode.java:132)
> Caused by: 
> org.apache.ratis.thirdparty.io.netty.channel.unix.Errors$NativeIoException: 
> bind(..) failed: Address already in use
> 2022-11-28 09:44:11,910 [Thread-0] ERROR o.a.ratis.util.ExitUtils:133 - 
> Terminating with exit status -1: Thread[Thread-0,5,main] has thrown an 
> uncaught exception
> java.lang.NullPointerException: null
>         at 
> org.apache.iotdb.db.service.IoTDBShutdownHook.run(IoTDBShutdownHook.java:60)
> 查看这个节点的datanode进程的端口信息:
>   !screenshot-2.png! 
> 报错2(出现1次):
>  !screenshot-3.png! 
> 查看这个节点的datanode进程的端口信息:
>  !screenshot-4.png! 
> 启动成功的datanode的端口信息:
>  !screenshot-5.png! 
> 测试环境-私有云1期  , 8C32GB  ,24台机器
> 1. ConfigNode配置
> MAX_HEAP_SIZE="20G"
> MAX_DIRECT_MEMORY_SIZE="6G"
> 2. DataNode配置
> MAX_HEAP_SIZE="20G"
> MAX_DIRECT_MEMORY_SIZE="6G"
> 3. Common配置
> schema_replication_factor=3
> data_replication_factor=3
> 4.启动3ConfigNode (ip23,24,25)
> 5.启动21DataNode ,启动脚本(21个Datanode的启动命令,间隔1秒)
> [root@i-66xazbht deploy_mpp_scripts]# cat 4_start_data_node.sh
> #!/bin/bash
> cluster_dir="/data/iotdb"
> cur_cluster="m_1127_4d7c15d"
> u_name="root"
> exec 3<datanode.txt
> while read line <&3
> do
> ssh ${u_name}@${line} "source 
> /etc/profile;${cluster_dir}/${cur_cluster}/sbin/start-datanode.sh > /dev/null 
> 2>&1 &"
> sleep 1
> done
> 6.查看集群信息,总是有1个datanode 是Unknown,去这个节点查看log
>   !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to