[
https://issues.apache.org/jira/browse/IOTDB-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Song Ziyang reopened IOTDB-3862:
--------------------------------
> [shrink + Region-Migrate ] [pool-16-IoTDB-Region-Migrate-Pool-1] ERROR
> o.a.i.d.s.RegionMigrateService$RegionMigrateTask:349 - add new peer
> TEndPoint(ip:a.b.c.d, port:40010) for region DataRegion[x] failed
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: IOTDB-3862
> URL: https://issues.apache.org/jira/browse/IOTDB-3862
> Project: Apache IoTDB
> Issue Type: Bug
> Components: mpp-cluster
> Affects Versions: 0.14.0-SNAPSHOT
> Reporter: 刘珍
> Assignee: Song Ziyang
> Priority: Major
> Attachments: expand-ip1_datanode_logs.tar.gz, ip3.sh,
> ip3_config.properties, ip4.sh, ip4_config.properties, ip5.sh,
> ip5_config.properties, shrink-ip3_confignode_log_all.tar.gz,
> shrink-ip3_datanode_log_all.tar.gz
>
>
> master_0718_967cde6
> RatisConsensus ,3 副本3C3D
> 先启动3C3D集群(ip3/4/5),扩容1个datanode(ip1), 在ip5执行缩容ip3的操作。
> ip3的datanode有error日志:
> 2022-07-18 {color:red}*14:00:00*{color},830
> [pool-16-IoTDB-Region-Migrate-Pool-1] ERROR
> o.a.i.d.s.RegionMigrateService$RegionMigrateTask:349 - add new peer
> TEndPoint(ip:192.168.130.1, port:40010){color:red}* for region DataRegion[5]
> failed*{color}
> 扩容节点ip1 有warn
> 2022-07-18 {color:red}*13:57:06*{color},554 [grpc-default-executor-1] WARN
> o.a.ratis.util.LogUtils:122 - 192.168.130.1_40010:
> {color:red}*installSnapshot onError,*{color} lastRequest:
> 192.168.130.4_40010->192.168.130.1_40010#20-t1,previous=(t:1,
> i:1790),leaderCommit=1893,initializing? false,entries: size=103, first=(t:1,
> i:1791), STATEMACHINELOGENTRY, 942@client-A8FC5FC601AC:
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: client
> cancelled
> 缩容失败,可以连缩容datanode执行操作。
> 复现流程:
> 1. 启动3C3D 192.168.130.3/4/5 16C32G * 3
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> schema_replication_factor=3
> data_replication_factor=3
> MAX_HEAP_SIZE="16G"
> 2. 启动3个benchmark分别连3个datanode
> benchmark在192.168.130.2
> /home/benchmark/bm_0620_7ec96c1
> 配置文件见附件 , 启动命令
> nohup sh -x ip3.sh > 3.log &
> nohup sh -x ip4.sh > 4.log &
> nohup sh -x ip5.sh > 5.log &
> 运行约10分钟,ip5执行缩容操作,缩容ip为192.168.130.3 :
> 2022-07-18 *13:56:40*,902 [main] INFO o.a.i.db.service.DataNode:201 - Remove
> result TDataNodeRemoveResp(status:TSStatus(code:200, message:Server accept
> the request))
> 3. iotdb的日志见附件
--
This message was sent by Atlassian Jira
(v8.20.10#820010)