Ethan Rose created HDDS-14908:
---------------------------------
Summary: Container scanner should check for updates before
persisting merkle tree
Key: HDDS-14908
URL: https://issues.apache.org/jira/browse/HDDS-14908
Project: Apache Ozone
Issue Type: Sub-task
Reporter: Ethan Rose
Assignee: Ethan Rose
There is no coordination between reconciliation and container scanner by design
since they are both long running background processes. As part of this design,
there was a known corner case that could cause the checksum to deviate before
eventually converging if scanner and reconciliation ran at the same time:
The container scanner builds the merkle tree in memory as it is running, and
then flushes it to disk when it finishes scanning the container. If the
container is updated (via reconciliation) while the scanner is running, the
scanner may not see these updates and as a result persist a stale tree at the
end of its scan. A subsequent container scan would then see the fully updated
container contents and write the correct tree, restoring the system to a stable
state.
We can actually prevent this by having the container scanner check the data
checksum when it starts, and then again before it persists the tree. If the
checksum has changed, the scanner knows its work is now invalid and it must
rescan the container to build a new tree.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]