刘珍 created IOTDB-4387:
-------------------------

             Summary: [ IoTDB-ConfigNodeRPC-Processor ] Using too many 
IoTDB-ConfigNodeRPC-Processor threads
                 Key: IOTDB-4387
                 URL: https://issues.apache.org/jira/browse/IOTDB-4387
             Project: Apache IoTDB
          Issue Type: Bug
          Components: mpp-cluster
            Reporter: 刘珍
         Attachments: 0913_ip31_confignode_stack.out, 
benchmark_down_datanode.conf, down_datanode.sh

master_0909_bdd7ca8 , 3C9D
9个datanode,间隔30分钟,执行  1次下线 1个datanode,下线时间为10分钟。
客户端运行完成 进程退出。查看ConfigNode(leader)的线程信息:
线程总数:673
IoTDB-ConfigNodeRPC-Processor线程数:597
客户端断开连接,IoTDB-ConfigNodeRPC-Processor线程不减少。

测试流程
1. ConfigNode机器
172.20.70.31(leader)  8核32G
172.20.70.32/33    4核16G

confignode配置参数:
MAX_HEAP_SIZE="8G"
MAX_DIRECT_MEMORY_SIZE="4G"

schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
schema_replication_factor=3
data_replication_factor=3

jstack信息见附件

2. DataNode机器
172.20.70.2/3/4/5/13/14/16/18/19    8核32G
配置参数
MAX_HEAP_SIZE="20G"
MAX_DIRECT_MEMORY_SIZE="6G"

max_connection_for_internal_service=200
wal_buffer_size_in_byte=1048576
enable_timed_flush_seq_memtable=true
seq_memtable_flush_interval_in_ms=3600000
seq_memtable_flush_check_interval_in_ms=600000

enable_timed_flush_unseq_memtable=true
unseq_memtable_flush_interval_in_ms=3600000
unseq_memtable_flush_check_interval_in_ms=600000

max_waiting_time_when_insert_blocked=3600000
query_timeout_threshold=3600000


3. benchmark配置见附件
CLIENT_NUMBER=50
运行benchmark。
4. 运行down datanode 脚本
 cat down_datanode.sh
#!/bin/bash
node1="172.20.70.4"
node2="172.20.70.5"
node3="172.20.70.3"
node4="172.20.70.2"
node5="172.20.70.13"
node6="172.20.70.14"
node7="172.20.70.16"
node8="172.20.70.18"
node9="172.20.70.19"

cluster_dir="/data/iotdb"
cur_cluster="master_0909_bdd7ca8"
u_name="cluster"



function down_datanode()
{
    t=`date '+%Y-%m-%d %H:%M:%S'`
    echo "${t}"
    node=$1
    ${cluster_dir}/${cur_cluster}/datanode/sbin/start-cli.sh -h ${node} -e 
"show cluster"
    ${cluster_dir}/${cur_cluster}/datanode/sbin/start-cli.sh -h ${node} -e 
"show regions"
    ssh ${u_name}@${node} "source 
/etc/profile;${cluster_dir}/${cur_cluster}/datanode/sbin/stop-datanode.sh"
    sleep 10m
    ssh ${u_name}@${node} "source 
/etc/profile;${cluster_dir}/${cur_cluster}/datanode/sbin/start-datanode.sh > 
/dev/null 2>&1 &"
    sleep 30m

}
sleep 30m
down_datanode ${node1}
down_datanode ${node2}
down_datanode ${node3}
down_datanode ${node4}
down_datanode ${node5}
down_datanode ${node6}
down_datanode ${node7}
down_datanode ${node8}
down_datanode ${node9}



5. benchmark运行16.25小时





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to