+1 for the merge On Tue, Jun 10, 2025 at 8:44 PM Ethan Rose <er...@apache.org> wrote:
> Based on discussion in the community sync this week I want to add some more > information. > Code > > For those interested in checking out the code, these are some of the major > classes to start with: > > - ReconcileContainerTask: This is the command on the datanode that is > received from SCM to reconcile a container with a datanode’s peers. It > passes through the ReplicationSupervisor just like replication and > reconstruction commands. > - ContainerProtos.ContainerChecksumInfo: This is the proto format of the > new file that is written into the containers with the merkle tree and > list > of deleted blocks. > - ContainerMerkleTreeWriter: This class is used to build merkle trees > chunk by chunk and generate a protobuf representation of the tree. > - ContainerChecksumTreeManager: This class coordinates reads and writes > of ContainerChecksumInfo for containers. The diff method determines > which repairs should be done on a container based on a peer’s merkle > tree. > - KeyValueContainerCheck#scanData: This is the existing method called by > the background and on-demand container data scanners to scan a > container. > It has been updated to build the merkle tree as it runs. > - KeyValueHandler#reconcileContainer: This method updates the container > based on the peer’s replica. > - Major tests for reconciliation have been added to > TestContainerCommandReconciliation (integration test) and > TestContainerReconciliationWithMockDatanodes (unit test with mocked > clients). > - There are more tasks under the reconciliation jira to expand the > types of faults being tested. > > Logging > > Logging was added on the datanodes to track reconciliation as it is > happening. The datanode application log will print a summary of messages > like this: > > 2025-06-10 20:13:14,570 [main] INFO keyvalue.KeyValueHandler > (KeyValueHandler.java:reconcileContainer(1595)) - Beginning > reconciliation for container 100 with peer > bbc09073-ac0d-4b2f-afe4-1de5f9dc6f43(dn3/237.6.76.4). Current data > checksum is dcce847d > 2025-06-10 20:13:14,589 [main] WARN keyvalue.KeyValueHandler > (KeyValueHandler.java:reconcileContainer(1681)) - Container 100 > reconciled with peer > bbc09073-ac0d-4b2f-afe4-1de5f9dc6f43(dn3/237.6.76.4). Data checksum > updated from dcce847d to 16189e0b. > Missing blocks repaired: 5/5 > Missing chunks repaired: 0/0 > Corrupt chunks repaired: 10/10 > Time taken: 19 ms > 2025-06-10 20:13:14,589 [main] WARN keyvalue.KeyValueHandler > (KeyValueHandler.java:reconcileContainer(1704)) - Completed > reconciliation for container 100 with 1/1 peers. 15 blocks were > updated. Data checksum updated from dcce847d to 16189e0b > > This shows: > > - Reconciliation started between this datanode and one other peer for > container 100 > - After reconciliation with the peer completed, the data checksum of our > container was updated > - Compared to this peer, we needed to ingest 5 missing blocks and repair > 10 corrupt chunks. All operations were successful > - At the end we get a summary of how many changes were done to this > container after consulting all the peers in the reconcile request. In > this > case there was only one peer. > By enabling debug logging we can see the individual blocks and chunks > that were repaired as well. > > In the dn-container.log file, dataChecksum is now included for every log > line. We also get one new line in this log every time the checksum for a > container is updated. > > In case logs roll off, a debug tool to inspect container’s checksum > information locally on a datanode will be implemented in HDDS-13239 > <https://issues.apache.org/jira/browse/HDDS-13239>. > Metrics > > The metrics for reconciliation tasks are available as a part of > ReplicationSupervisor class which includes: > > - numRequestedContainerReconciliations - Number of reconciliation tasks > - numQueuedContainerReconciliations - Number of queued tasks > - numTimeoutContainerReconciliations - Number of timed-out tasks > - numSuccessContainerReconciliations- Number of Success > - numFailureContainerReconciliations - Number of Failures > - numSkippedContainerReconciliations - Number of Skipped Tasks > > Latency/Count metrics for the tasks exposed by CommandHandlerMetrics for > ReconcileContainerCommandHandler: > > - TotalRunTimeMs - The total runtime of the command handler in > milliseconds > - AvgRunTimeMs - Average run time of the command handler in milliseconds > - QueueWaitingTaskCount - The number of queued tasks waiting for > execution > - InvocationCount - The number of times the command handler has been > invoked > - CommandReceivedCount - The number of received SCM commands for each > command type > > Other container reconciliation-related tasks are encapsulated in > ContainerMerkleTreeMetrics: > > - numMerkleTreeWriteFailure - Number of Merkle tree write failure > - numMerkleTreeReadFailure - Number of Merkle tree read failure > - numMerkleTreeDiffFailure - Number of Merkle tree diff failure > - numNoRepairContainerDiff - Number of container diff that doesn’t > require repair > - numRepairContainerDiff - Number of container diff that require repair > - merkleTreeWriteLatencyNS- Merkle tree write latency > - merkleTreeReadLatencyNS - Merkle tree read latency > - merkleTreeCreateLatencyNS - Merkle tree creation latency > - merkleTreeDiffLatencyNS - Merkle tree diff latency > > > Thanks for reviewing > > Ethan >