刘珍 created IOTDB-4380:
-------------------------
Summary: delete storage group : wal file corrupt
o.a.i.d.w.n.WALNode$PlanNodeIterator:695 - timeout when waiting for next WAL
entry ready, execute rollWALFile.
Key: IOTDB-4380
URL: https://issues.apache.org/jira/browse/IOTDB-4380
Project: Apache IoTDB
Issue Type: Bug
Components: mpp-cluster
Affects Versions: 0.14.0-SNAPSHOT
Reporter: 刘珍
Assignee: 张洪胤
Attachments: more_metadata.conf
m_0908_7915b3f。
问题描述
datanode重启失败:
2022-09-09 16:32:00,011 [pool-33-IoTDB-LogDispatcher-DataRegion[12]-2] INFO
o.a.i.d.w.n.WALNode$PlanNodeIterator:695 - timeout when waiting for next WAL
entry ready, execute rollWALFile. {color:#DE350B}*Current search index in wal
buffer is 2959, and next target index is 2501 *{color}
MultiLeaderConsensus,3副本3节点
1. 创建元数据过程中,kill ip74
benchmark配置文件见附件。
2. 清空ip74 的操作系统缓存,启动ip74的datanode
3. 再次重新运行benchmark同一配置,IS_DELETE_DATA=true
这个参数为true,会先执行delete storage group root.test.*;
benchmark运行完成,stop ip74的datanode服务
备份data 为/data/mpp_test/m_0908_7915b3f/datanode/data_for_recovery_Test
4. 清ip74操作系统缓存,启动datanode服务
再次运行benchmark同一配置,benchmark运行完成,
查看ip74的日志,看到
2022-09-09 15:43:13,691 [pool-23-IoTDB-MPPDataExchangeRPC-Processor-40] ERROR
o.a.t.ProcessFunction:47 - Internal error processing getDataBlock
org.apache.thrift.TException: Source fragment instance not found. Fragment
instance ID: TFragmentInstanceId(queryId:20220909_074205_19400_3, fragmentId:2,
instanceId:0).
at
org.apache.iotdb.db.mpp.execution.exchange.MPPDataExchangeManager$MPPDataExchangeServiceImpl.getDataBlock(MPPDataExchangeManager.java:90)
at
org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:326)
at
org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:306)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
2022-09-09 15:43:15,312 [20220909_074205_19400_3.2.0.SinkHandle-3074] ERROR
o.a.i.d.m.e.e.SinkHandle:281 - The TsBlock doesn't exist. Sequence ID is 1,
remaining map is
[0=<org.apache.iotdb.tsfile.read.common.block.TsBlock@5f617979,1048576>]
2022-09-09 15:43:17,119 [pool-23-IoTDB-MPPDataExchangeRPC-Processor-22] ERROR
o.a.t.ProcessFunction:47 - Internal error processing getDataBlock
java.lang.IllegalStateException: The data block doesn't exist. Sequence ID: 1
at
org.apache.iotdb.db.mpp.execution.exchange.SinkHandle.getSerializedTsBlock(SinkHandle.java:285)
at
org.apache.iotdb.db.mpp.execution.exchange.MPPDataExchangeManager$MPPDataExchangeServiceImpl.getDataBlock(MPPDataExchangeManager.java:97)
at
org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:326)
at
org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:306)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
5. 停止ip74的datanode服务
备份data 到/data/mpp_test/m_0908_7915b3f/datanode/data_for_recovery_Test_2
清ip74操作系统缓存,启动ip74的datanode ,失败:
2022-09-09 16:44:00,039 [pool-33-IoTDB-LogDispatcher-DataRegion[12]-2] INFO
o.a.i.d.w.n.WALNode$PlanNodeIterator:695 - timeout when waiting for next WAL
entry ready, execute rollWALFile. Current search index in wal buffer is 2959,
and next target index is 2501
--
This message was sent by Atlassian Jira
(v8.20.10#820010)