[
https://issues.apache.org/jira/browse/IOTDB-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644775#comment-17644775
]
Yongzao Dan commented on IOTDB-5131:
------------------------------------
The reason leads to this bug is that there exists two other DataNodes who try
to synchronize schema to the test DataNode, which is shown in the following
picture:
!image-2022-12-08-09-07-51-007.png|width=416,height=256!
The blue circle is the test ConfigNode, the green circle is the test DataNode,
while the orange circles are two zombie DataNodes. These zombies try to
synchronize schema to the SchemaRegionGroup[1] located in the test DataNode,
for which finally created a zombie SchemaRegionGroup[1].
> [rel/1.0] 1rep1C1D, Follower is generated in the schema region, and "count
> devices" report an error
> ---------------------------------------------------------------------------------------------------
>
> Key: IOTDB-5131
> URL: https://issues.apache.org/jira/browse/IOTDB-5131
> Project: Apache IoTDB
> Issue Type: Bug
> Components: mpp-cluster
> Affects Versions: 1.0.0
> Reporter: 刘珍
> Assignee: Yongzao Dan
> Priority: Major
> Attachments: image-2022-12-06-19-08-02-946.png,
> image-2022-12-06-19-08-38-711.png, image-2022-12-08-09-07-51-007.png,
> iotdb_5131.conf
>
>
> rel/1.0 20221206
> cbf7291
> 1. ./sbin/start-standalone.sh 启动1副本1C1D集群
> 2. BM运行附件中的配置
> GROUP_NUMBER=10
> DEVICE_NUMBER=50
> SENSOR_NUMBER=200000
> BATCH_SIZE_PER_WRITE=1
> LOOP=1000
> 3. {color:#de350b}问题1 schema region 出现Follower{color}
> !image-2022-12-06-19-08-02-946.png|width=848,height=381!
> 4.{color:#de350b} 问题2 count devices 立即返回错误{color}
> !image-2022-12-06-19-08-38-711.png!
> 后台 log
> 2022-12-06 18:55:56,256
> [pool-32-IoTDB-MPPCoordinator-19$20221206_105556_71267_1.1.0] ERROR
> o.a.i.d.m.e.e.RegionReadExecutor:54 - Execute FragmentInstance in
> ConsensusGroup SchemaRegion[0] failed.
> org.apache.iotdb.consensus.exception.RatisRequestFailedException: Ratis
> request failed
> at
> org.apache.iotdb.consensus.ratis.RatisConsensus.read(RatisConsensus.java:323)
> at
> org.apache.iotdb.db.mpp.execution.executor.RegionReadExecutor.execute(RegionReadExecutor.java:46)
> at
> org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatchLocally(FragmentInstanceDispatcherImpl.java:224)
> at
> org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatchOneInstance(FragmentInstanceDispatcherImpl.java:137)
> at
> org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.lambda$dispatchRead$0(FragmentInstanceDispatcherImpl.java:102)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.ratis.protocol.exceptions.ServerNotReadyException:
> 1@group-000200000000 is not in [RUNNING]: current state is CLOSED
> at
> org.apache.ratis.server.impl.RaftServerImpl.lambda$assertLifeCycleState$9(RaftServerImpl.java:733)
> at org.apache.ratis.util.LifeCycle.assertCurrentState(LifeCycle.java:253)
> at
> org.apache.ratis.server.impl.RaftServerImpl.assertLifeCycleState(RaftServerImpl.java:732)
> at
> org.apache.ratis.server.impl.RaftServerImpl.submitClientRequestAsync(RaftServerImpl.java:822)
> at
> org.apache.ratis.server.impl.RaftServerImpl.submitClientRequest(RaftServerImpl.java:944)
> at
> org.apache.ratis.server.impl.RaftServerProxy.submitClientRequest(RaftServerProxy.java:442)
> at
> org.apache.iotdb.consensus.ratis.RatisConsensus.read(RatisConsensus.java:318)
> ... 8 common frames omitted
> 测试环境
> 1. fit 机器 192.168.130.1 8C32GB
> 安装目录:/data1/relv1_1206_cbf7291
> ConfigNode 配置
> MAX_HEAP_SIZE="2G"
> DataNode配置
> MAX_HEAP_SIZE="20G"
> MAX_DIRECT_MEMORY_SIZE="6G"
> dn_wal_dirs=/data2/wal,/data3/wal
> 2. BM配置见附件
> 192.168.130.2 /data1/bm_v1
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)