刘珍 created IOTDB-4387:
-------------------------
Summary: [ IoTDB-ConfigNodeRPC-Processor ] Using too many
IoTDB-ConfigNodeRPC-Processor threads
Key: IOTDB-4387
URL: https://issues.apache.org/jira/browse/IOTDB-4387
Project: Apache IoTDB
Issue Type: Bug
Components: mpp-cluster
Reporter: 刘珍
Attachments: 0913_ip31_confignode_stack.out,
benchmark_down_datanode.conf, down_datanode.sh
master_0909_bdd7ca8 , 3C9D
9个datanode,间隔30分钟,执行 1次下线 1个datanode,下线时间为10分钟。
客户端运行完成 进程退出。查看ConfigNode(leader)的线程信息:
线程总数:673
IoTDB-ConfigNodeRPC-Processor线程数:597
客户端断开连接,IoTDB-ConfigNodeRPC-Processor线程不减少。
测试流程
1. ConfigNode机器
172.20.70.31(leader) 8核32G
172.20.70.32/33 4核16G
confignode配置参数:
MAX_HEAP_SIZE="8G"
MAX_DIRECT_MEMORY_SIZE="4G"
schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
schema_replication_factor=3
data_replication_factor=3
jstack信息见附件
2. DataNode机器
172.20.70.2/3/4/5/13/14/16/18/19 8核32G
配置参数
MAX_HEAP_SIZE="20G"
MAX_DIRECT_MEMORY_SIZE="6G"
max_connection_for_internal_service=200
wal_buffer_size_in_byte=1048576
enable_timed_flush_seq_memtable=true
seq_memtable_flush_interval_in_ms=3600000
seq_memtable_flush_check_interval_in_ms=600000
enable_timed_flush_unseq_memtable=true
unseq_memtable_flush_interval_in_ms=3600000
unseq_memtable_flush_check_interval_in_ms=600000
max_waiting_time_when_insert_blocked=3600000
query_timeout_threshold=3600000
3. benchmark配置见附件
CLIENT_NUMBER=50
运行benchmark。
4. 运行down datanode 脚本
cat down_datanode.sh
#!/bin/bash
node1="172.20.70.4"
node2="172.20.70.5"
node3="172.20.70.3"
node4="172.20.70.2"
node5="172.20.70.13"
node6="172.20.70.14"
node7="172.20.70.16"
node8="172.20.70.18"
node9="172.20.70.19"
cluster_dir="/data/iotdb"
cur_cluster="master_0909_bdd7ca8"
u_name="cluster"
function down_datanode()
{
t=`date '+%Y-%m-%d %H:%M:%S'`
echo "${t}"
node=$1
${cluster_dir}/${cur_cluster}/datanode/sbin/start-cli.sh -h ${node} -e
"show cluster"
${cluster_dir}/${cur_cluster}/datanode/sbin/start-cli.sh -h ${node} -e
"show regions"
ssh ${u_name}@${node} "source
/etc/profile;${cluster_dir}/${cur_cluster}/datanode/sbin/stop-datanode.sh"
sleep 10m
ssh ${u_name}@${node} "source
/etc/profile;${cluster_dir}/${cur_cluster}/datanode/sbin/start-datanode.sh >
/dev/null 2>&1 &"
sleep 30m
}
sleep 30m
down_datanode ${node1}
down_datanode ${node2}
down_datanode ${node3}
down_datanode ${node4}
down_datanode ${node5}
down_datanode ${node6}
down_datanode ${node7}
down_datanode ${node8}
down_datanode ${node9}
5. benchmark运行16.25小时
--
This message was sent by Atlassian Jira
(v8.20.10#820010)