[
https://issues.apache.org/jira/browse/IOTDB-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jinrui Zhang reassigned IOTDB-4731:
-----------------------------------
Assignee: Haiming Zhu (was: Jinrui Zhang)
> [Remove-DataNode] Data is inconsistent ( remove datanode before the
> synchronization is complete )
> --------------------------------------------------------------------------------------------------
>
> Key: IOTDB-4731
> URL: https://issues.apache.org/jira/browse/IOTDB-4731
> Project: Apache IoTDB
> Issue Type: Bug
> Components: mpp-cluster
> Affects Versions: 0.14.0-SNAPSHOT
> Reporter: 刘珍
> Assignee: Haiming Zhu
> Priority: Major
> Attachments: aft_set_readonlyip68_regions.out,
> bef_set_ip68readonly_regions.out, image-2022-10-24-15-13-45-086.png,
> image-2022-10-24-15-22-17-729.png,
> ip64-is-newpeer_leader_g_4_q_after_remove.out, ip66_g_4_q_after_remove.out,
> more_ts.conf
>
>
> master_1023_2fea011
> 3rep , 3C3D ,benchmark write done .
> Start the fourth datanode (ip64).
> ip68 : SET SYSTEM TO READONLY ON LOCAL
> remove datanode(ip68).
> before remove , ip68 is DataRegion[14]' Leader , there is unsynchronized
> data:
> !image-2022-10-24-15-13-45-086.png!
> When ip68 is in the removing state , datanode error log :
> 2022-10-24 14:18:18,092 [pool-49-IoTDB-LogDispatcher-DataRegion[14]-3] ERROR
> o.a.i.d.w.n.WALNode$PlanNodeIterator:590 - Fail to read wal from wal file
> /data/liuzhen_test/master_1023_2fea011/sbin/../data/datanode/wal/root.test.g_4-14/_150-200-1.wal,
> skip this file.
> java.nio.channels.ClosedByInterruptException: null
> at
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
> at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:315)
> at
> org.apache.iotdb.db.wal.io.WALByteBufReader.<init>(WALByteBufReader.java:47)
> at
> org.apache.iotdb.db.wal.node.WALNode$PlanNodeIterator.hasNext(WALNode.java:552)
> at
> org.apache.iotdb.db.wal.node.WALNode$PlanNodeIterator.next(WALNode.java:683)
> at
> org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.constructBatchFromWAL(LogDispatcher.java:438)
> at
> org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.getBatch(LogDispatcher.java:348)
> at
> org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.run(LogDispatcher.java:274)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2022-10-24 14:18:18,093 [pool-49-IoTDB-LogDispatcher-DataRegion[14]-3] ERROR
> o.a.i.d.w.n.WALNode$PlanNodeIterator:590 - Fail to read wal from wal file
> /data/liuzhen_test/master_1023_2fea011/sbin/../data/datanode/wal/root.test.g_4-14/_151-204-1.wal,
> skip this file.
> java.nio.channels.ClosedByInterruptException: null
> at
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
> at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:315)
> at
> org.apache.iotdb.db.wal.io.WALByteBufReader.<init>(WALByteBufReader.java:47)
> at
> org.apache.iotdb.db.wal.node.WALNode$PlanNodeIterator.hasNext(WALNode.java:552)
> at
> org.apache.iotdb.db.wal.node.WALNode$PlanNodeIterator.hasNext(WALNode.java:597)
> at
> org.apache.iotdb.db.wal.node.WALNode$PlanNodeIterator.next(WALNode.java:683)
> at
> org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.constructBatchFromWAL(LogDispatcher.java:438)
> at
> org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.getBatch(LogDispatcher.java:348)
> at
> org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.run(LogDispatcher.java:274)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2022-10-24 14:18:18,093 [pool-49-IoTDB-LogDispatcher-DataRegion[14]-3] ERROR
> o.a.i.c.m.l.LogDispatcher$LogDispatcherThread:294 - Unexpected error in
> logDispatcher for peer Peer{groupId=DataRegion[14],
> endpoint=TEndPoint(ip:192.168.10.64, port:40010), nodeId=6}
> java.lang.ArrayIndexOutOfBoundsException: 29
> at
> org.apache.iotdb.db.wal.node.WALNode$PlanNodeIterator.hasNext(WALNode.java:530)
> at
> org.apache.iotdb.db.wal.node.WALNode$PlanNodeIterator.hasNext(WALNode.java:597)
> at
> org.apache.iotdb.db.wal.node.WALNode$PlanNodeIterator.hasNext(WALNode.java:597)
> at
> org.apache.iotdb.db.wal.node.WALNode$PlanNodeIterator.next(WALNode.java:683)
> at
> org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.constructBatchFromWAL(LogDispatcher.java:438)
> at
> org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.getBatch(LogDispatcher.java:348)
> at
> org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.run(LogDispatcher.java:274)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> After ip68 removed , Query DataRegion[14] :
> "select count(s_0) from root.test.g_4.** align by device"
> ip64( DataRegion[14] Leader ) : 100 points/sensor
> stop ip64 datanode
> "select count(s_0) from root.test.g_4.** align by device"
> ip66 (48 rows < 100 points ):
> !image-2022-10-24-15-22-17-729.png!
> Test environment
> 1. 192.168.10.62 / 66 /68 72CPU 256GB
> benchmark : ip64 /data/liuzhen_test/weektest/benchmark_tool
> ConfigNode
> MAX_HEAP_SIZE="8G"
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
> schema_replication_factor=3
> data_replication_factor=3
> DataNode
> MAX_HEAP_SIZE="192G"
> MAX_DIRECT_MEMORY_SIZE="32G"
> query_timeout_threshold=36000000
> 2. benchmark configuration
> see attachment .
> 3. after benchmark write done
> ip68 cli "flush"
> ip68 cli : SET SYSTEM TO READONLY ON LOCAL
> start ip64 datanode .
> remove-datanode.sh "ip68's NodeID"
> 4. View ip68 datanode logs
--
This message was sent by Atlassian Jira
(v8.20.10#820010)