+1 for the merge

On Tue, Jun 10, 2025 at 8:44 PM Ethan Rose <er...@apache.org> wrote:

> Based on discussion in the community sync this week I want to add some more
> information.
> Code
>
> For those interested in checking out the code, these are some of the major
> classes to start with:
>
>    - ReconcileContainerTask: This is the command on the datanode that is
>    received from SCM to reconcile a container with a datanode’s peers. It
>    passes through the ReplicationSupervisor just like replication and
>    reconstruction commands.
>    - ContainerProtos.ContainerChecksumInfo: This is the proto format of the
>    new file that is written into the containers with the merkle tree and
> list
>    of deleted blocks.
>    - ContainerMerkleTreeWriter: This class is used to build merkle trees
>    chunk by chunk and generate a protobuf representation of the tree.
>    - ContainerChecksumTreeManager: This class coordinates reads and writes
>    of ContainerChecksumInfo for containers. The diff method determines
>    which repairs should be done on a container based on a peer’s merkle
> tree.
>    - KeyValueContainerCheck#scanData: This is the existing method called by
>    the background and on-demand container data scanners to scan a
> container.
>    It has been updated to build the merkle tree as it runs.
>    - KeyValueHandler#reconcileContainer: This method updates the container
>    based on the peer’s replica.
>    - Major tests for reconciliation have been added to
>    TestContainerCommandReconciliation (integration test) and
>    TestContainerReconciliationWithMockDatanodes (unit test with mocked
>    clients).
>       - There are more tasks under the reconciliation jira to expand the
>       types of faults being tested.
>
> Logging
>
> Logging was added on the datanodes to track reconciliation as it is
> happening. The datanode application log will print a summary of messages
> like this:
>
> 2025-06-10 20:13:14,570 [main] INFO  keyvalue.KeyValueHandler
> (KeyValueHandler.java:reconcileContainer(1595)) - Beginning
> reconciliation for container 100 with peer
> bbc09073-ac0d-4b2f-afe4-1de5f9dc6f43(dn3/237.6.76.4). Current data
> checksum is dcce847d
> 2025-06-10 20:13:14,589 [main] WARN  keyvalue.KeyValueHandler
> (KeyValueHandler.java:reconcileContainer(1681)) - Container 100
> reconciled with peer
> bbc09073-ac0d-4b2f-afe4-1de5f9dc6f43(dn3/237.6.76.4). Data checksum
> updated from dcce847d to 16189e0b.
> Missing blocks repaired: 5/5
> Missing chunks repaired: 0/0
> Corrupt chunks repaired:  10/10
> Time taken: 19 ms
> 2025-06-10 20:13:14,589 [main] WARN  keyvalue.KeyValueHandler
> (KeyValueHandler.java:reconcileContainer(1704)) - Completed
> reconciliation for container 100 with 1/1 peers. 15 blocks were
> updated. Data checksum updated from dcce847d to 16189e0b
>
> This shows:
>
>    - Reconciliation started between this datanode and one other peer for
>    container 100
>    - After reconciliation with the peer completed, the data checksum of our
>    container was updated
>    - Compared to this peer, we needed to ingest 5 missing blocks and repair
>    10 corrupt chunks. All operations were successful
>    - At the end we get a summary of how many changes were done to this
>    container after consulting all the peers in the reconcile request. In
> this
>    case there was only one peer.
>    By enabling debug logging we can see the individual blocks and chunks
>    that were repaired as well.
>
> In the dn-container.log file, dataChecksum is now included for every log
> line. We also get one new line in this log every time the checksum for a
> container is updated.
>
> In case logs roll off, a debug tool to inspect container’s checksum
> information locally on a datanode will be implemented in HDDS-13239
> <https://issues.apache.org/jira/browse/HDDS-13239>.
> Metrics
>
> The metrics for reconciliation tasks are available as a part of
> ReplicationSupervisor class which includes:
>
>    - numRequestedContainerReconciliations - Number of reconciliation tasks
>    - numQueuedContainerReconciliations - Number of queued tasks
>    - numTimeoutContainerReconciliations - Number of timed-out tasks
>    - numSuccessContainerReconciliations- Number of Success
>    - numFailureContainerReconciliations - Number of Failures
>    - numSkippedContainerReconciliations - Number of Skipped Tasks
>
> Latency/Count metrics for the tasks exposed by CommandHandlerMetrics for
> ReconcileContainerCommandHandler:
>
>    - TotalRunTimeMs - The total runtime of the command handler in
>    milliseconds
>    - AvgRunTimeMs - Average run time of the command handler in milliseconds
>    - QueueWaitingTaskCount - The number of queued tasks waiting for
>    execution
>    - InvocationCount - The number of times the command handler has been
>    invoked
>    - CommandReceivedCount - The number of received SCM commands for each
>    command type
>
> Other container reconciliation-related tasks are encapsulated in
> ContainerMerkleTreeMetrics:
>
>    - numMerkleTreeWriteFailure - Number of Merkle tree write failure
>    - numMerkleTreeReadFailure - Number of Merkle tree read failure
>    - numMerkleTreeDiffFailure - Number of Merkle tree diff failure
>    - numNoRepairContainerDiff - Number of container diff that doesn’t
>    require repair
>    - numRepairContainerDiff - Number of container diff that require repair
>    - merkleTreeWriteLatencyNS- Merkle tree write latency
>    - merkleTreeReadLatencyNS - Merkle tree read latency
>    - merkleTreeCreateLatencyNS - Merkle tree creation latency
>    - merkleTreeDiffLatencyNS - Merkle tree diff latency
>
>
> Thanks for reviewing
>
> Ethan
>

Reply via email to