刘珍 created IOTDB-3247:
-------------------------
Summary: [wal recovery] Aligned sensors, query lost data
Key: IOTDB-3247
URL: https://issues.apache.org/jira/browse/IOTDB-3247
Project: Apache IoTDB
Issue Type: Bug
Components: Core/WAL
Affects Versions: 0.14.0-SNAPSHOT
Reporter: 刘珍
Assignee: Haiming Zhu
Attachments: image-2022-05-20-16-09-01-848.png
master_0519_81b9117
问题描述:
100sg,500dev,20万序列/dev,共1亿对齐序列,每个序列写入10个点。
每个device,delete 51个序列,重启iotdb,wal恢复有2个问题:
问题1:未被delete的部分序列,{color:#DE350B}*查询少数据*{color}(值小于10)
问题2:恢复过程中有NPE
2022-05-20 14:16:09,213 [pool-15-IoTDB-WAL-Recover-2] WARN
o.a.i.d.w.r.f.UnsealedTsFileRecoverPerformer:208 - meet error when redo wal of
/data/liuzhen_test/master_0519_81b9117/datanode/./sbin/../data/data/sequence/root.test.g_99/0/0/1652977295224-2-0-0.tsfile
org.apache.iotdb.db.exception.WriteProcessException:
java.lang.NullPointerException
at
org.apache.iotdb.db.engine.memtable.AbstractMemTable.insertAlignedTablet(AbstractMemTable.java:394)
at
org.apache.iotdb.db.wal.recover.file.TsFilePlanRedoer.redoInsert(TsFilePlanRedoer.java:128)
at
org.apache.iotdb.db.wal.recover.file.UnsealedTsFileRecoverPerformer.redoLog(UnsealedTsFileRecoverPerformer.java:191)
at
org.apache.iotdb.db.wal.recover.WALNodeRecoverTask.recoverTsFiles(WALNodeRecoverTask.java:137)
at
org.apache.iotdb.db.wal.recover.WALNodeRecoverTask.run(WALNodeRecoverTask.java:63)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException: null
at
org.apache.iotdb.db.utils.datastructure.AlignedTVList.arrayCopy(AlignedTVList.java:808)
at
org.apache.iotdb.db.utils.datastructure.AlignedTVList.putAlignedValues(AlignedTVList.java:736)
at
org.apache.iotdb.db.engine.memtable.AlignedWritableMemChunk.putAlignedValues(AlignedWritableMemChunk.java:152)
at
org.apache.iotdb.db.engine.memtable.AlignedWritableMemChunk.writeAlignedValues(AlignedWritableMemChunk.java:182)
at
org.apache.iotdb.db.engine.memtable.AlignedWritableMemChunkGroup.writeValues(AlignedWritableMemChunkGroup.java:55)
at
org.apache.iotdb.db.engine.memtable.AbstractMemTable.writeAlignedTablet(AbstractMemTable.java:545)
at
org.apache.iotdb.db.engine.memtable.AbstractMemTable.insertAlignedTablet(AbstractMemTable.java:377)
... 9 common frames omitted
测试流程
1. 192.168.10.68 72C256G
iotdb路径:/data/liuzhen_test/master_0519_81b9117/datanode
iotdb配置(其余不改动):
MAX_HEAP_SIZE="192G"
MAX_DIRECT_MEMORY_SIZE="32G"
mlog_buffer_size=10485760
schema_engine_mode=Schema_File
benchmark路径:/data/benchmark/weekly_shell/bm_0514_ee75a49
bm配置见附件。
2. 启动iotdb,运行benchmark
耗时大概3小时。
3. delete 序列前的数据验证
正确
count_ts_500dev.sh 每个设备20万序列
select_count_ts_500dev.sh 查询序列10个点数据。
4. 每个设备delete 51个序列
运行del_ts.sh
5. delete 序列后,停止iotdb前,再次验证数据的正确性
正确
count_ts_500dev.sh 每个设备199949序列
select_count_ts_500dev.sh 查询序列10个点数据。
6.停止iotdb
7. 备份数据,日志
8.重新启动iotdb,查看日志,有NPE
9. iotdb恢复成功,执行
select_count_ts_500dev.sh {color:#DE350B}*部分少数据的序列*{color}(只列举部分)
!image-2022-05-20-16-09-01-848.png!
--
This message was sent by Atlassian Jira
(v8.20.7#820007)