[
https://issues.apache.org/jira/browse/IOTDB-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
刘珍 reopened IOTDB-4027:
-----------------------
master_0823_57428ae 故障节点重启后:
2022-08-23 13:25:55,275
[172.20.70.3_40010@group-000100000001-StateMachineUpdater] ERROR
o.a.i.d.e.s.SnapshotTaker:104 - Exception occurs when taking snapshot for
root.ip4.g_0-1
{color:#DE350B}*java.io.FileNotFoundException:
*{color}/data/iotdb/master_0823_57428ae/datanode/./sbin/../data/consensus/data_region/47474747-4747-4747-4747-000100000001/sm/2_816838/snapshot.log
(No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
at
org.apache.iotdb.db.engine.snapshot.SnapshotLogger.<init>(SnapshotLogger.java:38)
at
org.apache.iotdb.db.engine.snapshot.SnapshotTaker.takeFullSnapshot(SnapshotTaker.java:77)
at
org.apache.iotdb.db.consensus.statemachine.DataRegionStateMachine.takeSnapshot(DataRegionStateMachine.java:97)
at
org.apache.iotdb.consensus.ratis.ApplicationStateMachineProxy.takeSnapshot(ApplicationStateMachineProxy.java:183)
at
org.apache.ratis.server.impl.StateMachineUpdater.takeSnapshot(StateMachineUpdater.java:270)
at
org.apache.ratis.server.impl.StateMachineUpdater.checkAndTakeSnapshot(StateMachineUpdater.java:262)
at
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:183)
at java.lang.Thread.run(Thread.java:748)
2022-08-23 13:25:55,276
[172.20.70.3_40010@group-000100000001-StateMachineUpdater] ERROR
o.a.i.d.e.s.SnapshotTaker:114 - Failed to close snapshot logger
{color:#DE350B}*java.lang.NullPointerException: null*{color}
at
org.apache.iotdb.db.engine.snapshot.SnapshotTaker.takeFullSnapshot(SnapshotTaker.java:112)
at
org.apache.iotdb.db.consensus.statemachine.DataRegionStateMachine.takeSnapshot(DataRegionStateMachine.java:97)
at
org.apache.iotdb.consensus.ratis.ApplicationStateMachineProxy.takeSnapshot(ApplicationStateMachineProxy.java:183)
at
org.apache.ratis.server.impl.StateMachineUpdater.takeSnapshot(StateMachineUpdater.java:270)
at
org.apache.ratis.server.impl.StateMachineUpdater.checkAndTakeSnapshot(StateMachineUpdater.java:262)
at
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:183)
at java.lang.Thread.run(Thread.java:748)
> ERROR o.a.i.d.e.s.SnapshotLoader:94 - Exception occurs when creating links
> from snapshot directory to data directory
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: IOTDB-4027
> URL: https://issues.apache.org/jira/browse/IOTDB-4027
> Project: Apache IoTDB
> Issue Type: Bug
> Components: mpp-cluster
> Affects Versions: 0.14.0-SNAPSHOT
> Reporter: 刘珍
> Assignee: Song Ziyang
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.14.0
>
> Attachments: image-2022-08-03-09-39-10-230.png,
> image-2022-08-03-09-39-48-739.png, ip18_befor_stop_datanode_log.tar.gz,
> ip18_restart_with-error_log.tar.gz, ip4_2000_config.properties,
> screenshot-1.png
>
>
> master_0801_55b5b17
> 问题描述
> RatisConsensus,3副本3C9D,1个bm连1个datanode执行并发写入,停止1个follower节点,5分钟后启动;{color:#DE350B}*然后停止另1个follower节点10分钟后启动,此节点启动过程中报错,此节点少数据*{color}:
> 2022-08-02 18:04:17,376 [pool-4-thread-1] ERROR o.a.i.d.e.s.SnapshotLoader:94
> - Exception occurs when creating links from snapshot directory to data
> directory
> java.io.IOException: Cannot find
> /data/iotdb/master_0801_2de0dd8/datanode/./sbin/../data/consensus/data_region/47474747-4747-4747-4747-000100000001/sm/1_354536/sequence/root.ip4.g_0
> or
> /data/iotdb/master_0801_2de0dd8/datanode/./sbin/../data/consensus/data_region/47474747-4747-4747-4747-000100000001/sm/1_354536/unsequence/root.ip4.g_0
> at
> org.apache.iotdb.db.engine.snapshot.SnapshotLoader.createLinksFromSnapshotDirToDataDir(SnapshotLoader.java:163)
> at
> org.apache.iotdb.db.engine.snapshot.SnapshotLoader.loadSnapshotForStateMachine(SnapshotLoader.java:91)
> at
> org.apache.iotdb.db.consensus.statemachine.DataRegionStateMachine.loadSnapshot(DataRegionStateMachine.java:93)
> at
> org.apache.iotdb.consensus.ratis.ApplicationStateMachineProxy.loadSnapshot(ApplicationStateMachineProxy.java:188)
> at
> org.apache.iotdb.consensus.ratis.ApplicationStateMachineProxy.lambda$initialize$0(ApplicationStateMachineProxy.java:73)
> at
> org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:270)
> at
> org.apache.iotdb.consensus.ratis.ApplicationStateMachineProxy.initialize(ApplicationStateMachineProxy.java:69)
> at
> org.apache.ratis.server.impl.ServerState.<init>(ServerState.java:136)
> at
> org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.java:201)
> at
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$5(RaftServerProxy.java:274)
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2022-08-02 18:04:17,376 [pool-4-thread-1] ERROR
> o.a.i.d.c.s.DataRegionStateMachine:95 - Fail to load snapshot from
> /data/iotdb/master_0801_2de0dd8/datanode/./sbin/../data/consensus/data_region/47474747-4747-4747-4747-000100000001/sm/1_354536
> ip18少数据,期望序列的count值是20000点
> !screenshot-1.png!
> 1. 复现流程
> 私有云172.20.70.2/3/4/5/13/14/16/18/19
> benchmark 在ip15(连ip4)
> 停ip4/启动ip4 , 停ip18/启动ip18,ip18报错
> !image-2022-08-03-09-39-10-230.png!
> !image-2022-08-03-09-39-48-739.png!
> 2. 启动benchmark
> 2022-08-02 17:34:57 启动bm
> 3. 停止ip4的datanode
> 2022-08-02 17:45:42停止datanode
> sleep 300
> 启动ip4
> 4. 停止ip18的datanode
> 2022-08-02 17:54:11 停止ip18的datanode
> sleep 600
> 启动ip18
> {color:#DE350B}*启动过程中,报错*{color}:
> 见问题描述
> bm写入完成,各节点同步完成,{color:#DE350B}*ip18节点少数据*{color},ip16,ip4 的数据正确。
--
This message was sent by Atlassian Jira
(v8.20.10#820010)