[ 
https://issues.apache.org/jira/browse/IOTDB-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Song Ziyang reopened IOTDB-3862:
--------------------------------

> [shrink + Region-Migrate ] [pool-16-IoTDB-Region-Migrate-Pool-1] ERROR 
> o.a.i.d.s.RegionMigrateService$RegionMigrateTask:349 - add new peer 
> TEndPoint(ip:a.b.c.d, port:40010) for region DataRegion[x] failed
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: IOTDB-3862
>                 URL: https://issues.apache.org/jira/browse/IOTDB-3862
>             Project: Apache IoTDB
>          Issue Type: Bug
>          Components: mpp-cluster
>    Affects Versions: 0.14.0-SNAPSHOT
>            Reporter: 刘珍
>            Assignee: Song Ziyang
>            Priority: Major
>         Attachments: expand-ip1_datanode_logs.tar.gz, ip3.sh, 
> ip3_config.properties, ip4.sh, ip4_config.properties, ip5.sh, 
> ip5_config.properties, shrink-ip3_confignode_log_all.tar.gz, 
> shrink-ip3_datanode_log_all.tar.gz
>
>
> master_0718_967cde6
> RatisConsensus ,3 副本3C3D
> 先启动3C3D集群(ip3/4/5),扩容1个datanode(ip1), 在ip5执行缩容ip3的操作。
> ip3的datanode有error日志:
> 2022-07-18 {color:red}*14:00:00*{color},830 
> [pool-16-IoTDB-Region-Migrate-Pool-1] ERROR 
> o.a.i.d.s.RegionMigrateService$RegionMigrateTask:349 - add new peer 
> TEndPoint(ip:192.168.130.1, port:40010){color:red}* for region DataRegion[5] 
> failed*{color}
> 扩容节点ip1 有warn
> 2022-07-18 {color:red}*13:57:06*{color},554 [grpc-default-executor-1] WARN  
> o.a.ratis.util.LogUtils:122 - 192.168.130.1_40010: 
> {color:red}*installSnapshot onError,*{color} lastRequest: 
> 192.168.130.4_40010->192.168.130.1_40010#20-t1,previous=(t:1, 
> i:1790),leaderCommit=1893,initializing? false,entries: size=103, first=(t:1, 
> i:1791), STATEMACHINELOGENTRY, 942@client-A8FC5FC601AC: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: client 
> cancelled
> 缩容失败,可以连缩容datanode执行操作。
> 复现流程:
> 1. 启动3C3D 192.168.130.3/4/5  16C32G * 3
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> schema_replication_factor=3
> data_replication_factor=3
> MAX_HEAP_SIZE="16G"
> 2. 启动3个benchmark分别连3个datanode
> benchmark在192.168.130.2
> /home/benchmark/bm_0620_7ec96c1
> 配置文件见附件  , 启动命令
> nohup sh -x ip3.sh > 3.log &
> nohup sh -x ip4.sh > 4.log &
> nohup sh -x ip5.sh > 5.log &
> 运行约10分钟,ip5执行缩容操作,缩容ip为192.168.130.3 :
> 2022-07-18 *13:56:40*,902 [main] INFO  o.a.i.db.service.DataNode:201 - Remove 
> result TDataNodeRemoveResp(status:TSStatus(code:200, message:Server accept 
> the request)) 
> 3. iotdb的日志见附件



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to