Ethan Rose created HDDS-8770:
--------------------------------

             Summary: Cleanup of failed container delete may remove datanode 
RocksDB entries of active container
                 Key: HDDS-8770
                 URL: https://issues.apache.org/jira/browse/HDDS-8770
             Project: Apache Ozone
          Issue Type: Sub-task
          Components: Ozone Datanode
            Reporter: Ethan Rose


Now that container schema v3 has been implemented, container level updates like 
delete and import require both moving the container directory, and editing the 
container's entries in RocksDB.

Originally in commit bf5b6f5 the container delete steps were:
1. Remove entries from RocksDB
2. Delete container directory

In this implementation, it is possible that the RocksDB update succeeds but the 
container delete fails, leaving behind a container directory on the disk that 
is discovered at startup. The datanode would load the container and recalculate 
only the metadata values (KeyValueContianerUtil#verifyAndFixupContainerData). 
Delete transaction and block data would be lost, leaving this container 
corrupted, but reported as healthy to SCM until the scanner identifies it.

After HDDS-6449, the steps were changed so that failed directory deletes would 
not leave broken container directories that the datanode discovers on startup. 
The deletion steps became:
1. Move container directory to tmp deleted containers directory on the same 
file system (atomic).
2. Delete DB entries
3. Delete container from tmp directory.

The deleted container directory will be cleared on datanode startup and 
shutdown, and this process will also clear corresponding RocksDB entries that 
may not have been cleared if an error happened after step 1. This can cause 
RocksDB data for an active container replica to be deleted incorrectly in the 
following case:
    1. Container 1 is deleted. Rename of the container directory to the delete 
directory succeeds but DB update fails.
    2. Container 1 is re-imported to the same datanode on the same volume. The 
imported SST files overwrite the old ones in the DB.
    3. Datanode is restarted, triggering cleanup of the deleted container 
directory and RocksDB entries for any containers there.
        - This deletes data belonging to container ID 1, which now happens to 
belong to the active container.

Container import can have similar issues as well. We need a standardized 
process to keep DB and directory updates consistent and recover from failures 
between the two operations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to