Xinyu Tan created IOTDB-5835:
--------------------------------
Summary: Fix wal accumulation caused by datanode restart
Key: IOTDB-5835
URL: https://issues.apache.org/jira/browse/IOTDB-5835
Project: Apache IoTDB
Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan
Attachments: image-2023-04-28-11-08-43-542.png,
image-2023-04-28-11-08-51-622.png, image-2023-04-28-11-08-57-549.png,
image-2023-04-28-11-09-03-902.png
When cluster is running properly, if replica A of a consensus group becomes the
Leader, it continuously sends logs to other followers and updates wal's
safelyDeletedSearchIndex after sending logs. wal files is deleted
asynchronously. Therefore, if a restart occurs, some logs that have been
synchronized to other nodes may not be deleted. After the restart, perhaps
another replica B becomes the Leader and the current replica A becomes a
Follower receiving logs.
Because the current IoTConsensus does not use its recovered syncIndex to set
the safelyDeletedSearchIndex of the underlying walnode at startup, replica A
cannot delete wal files at this time, which results in the accumulation of WAL
files. Write requests of all regions on the node are affected.
!image-2023-04-28-11-08-43-542.png|thumbnail!
!image-2023-04-28-11-08-51-622.png|thumbnail!
!image-2023-04-28-11-08-57-549.png|thumbnail!
!image-2023-04-28-11-09-03-902.png|thumbnail!
The solution to this problem is to update the safelyDeletedSearchIndex of
reader at startup
--
This message was sent by Atlassian Jira
(v8.20.10#820010)