Ethan Rose created HDDS-8770:
--------------------------------
Summary: Cleanup of failed container delete may remove datanode
RocksDB entries of active container
Key: HDDS-8770
URL: https://issues.apache.org/jira/browse/HDDS-8770
Project: Apache Ozone
Issue Type: Sub-task
Components: Ozone Datanode
Reporter: Ethan Rose
Now that container schema v3 has been implemented, container level updates like
delete and import require both moving the container directory, and editing the
container's entries in RocksDB.
Originally in commit bf5b6f5 the container delete steps were:
1. Remove entries from RocksDB
2. Delete container directory
In this implementation, it is possible that the RocksDB update succeeds but the
container delete fails, leaving behind a container directory on the disk that
is discovered at startup. The datanode would load the container and recalculate
only the metadata values (KeyValueContianerUtil#verifyAndFixupContainerData).
Delete transaction and block data would be lost, leaving this container
corrupted, but reported as healthy to SCM until the scanner identifies it.
After HDDS-6449, the steps were changed so that failed directory deletes would
not leave broken container directories that the datanode discovers on startup.
The deletion steps became:
1. Move container directory to tmp deleted containers directory on the same
file system (atomic).
2. Delete DB entries
3. Delete container from tmp directory.
The deleted container directory will be cleared on datanode startup and
shutdown, and this process will also clear corresponding RocksDB entries that
may not have been cleared if an error happened after step 1. This can cause
RocksDB data for an active container replica to be deleted incorrectly in the
following case:
1. Container 1 is deleted. Rename of the container directory to the delete
directory succeeds but DB update fails.
2. Container 1 is re-imported to the same datanode on the same volume. The
imported SST files overwrite the old ones in the DB.
3. Datanode is restarted, triggering cleanup of the deleted container
directory and RocksDB entries for any containers there.
- This deletes data belonging to container ID 1, which now happens to
belong to the active container.
Container import can have similar issues as well. We need a standardized
process to keep DB and directory updates consistent and recover from failures
between the two operations.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]