[
https://issues.apache.org/jira/browse/HDDS-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Rose resolved HDDS-14936.
-------------------------------
Resolution: Implemented
> Container data checksum reverted by BackgroundContainerDataScanner after
> successful reconciliation
> --------------------------------------------------------------------------------------------------
>
> Key: HDDS-14936
> URL: https://issues.apache.org/jira/browse/HDDS-14936
> Project: Apache Ozone
> Issue Type: Bug
> Components: Ozone Datanode
> Reporter: Yashaswini G A
> Priority: Major
>
> After corrupting all three Ratis replicas of a CLOSED container on different
> block files {{ozone admin container reconcile }} was run. Reconciliation
> completed on a datanode with peers, reporting corrupt chunks repaired and
> updated data checksum ( 76950a80 -> 914f24e4). Shortly afterward,
> BackgroundContainerDataScanner on the same datanode logged CORRUPT_CHUNK for
> a .block file, OzoneChecksumException on read, and updated the container data
> checksum again ( 914f24e4 -> 76950a80). Later scans again flipped the
> checksum. {{ozone admin container reconcile --status}} showed
> replicasMatch=false with one replica still on the older checksum.
> h2. Steps to reproduce
> # Close container; note per-replica dataChecksum (three-way mismatch after
> corruption).
> # Run {{{}ozone admin container reconcile <containerID>{}}}.
> # Observe DN logs: ReconcileContainerTask / KeyValueHandler reports
> successful reconcile and checksum update.
> # Within seconds/minutes, observe ContainerDataScanner logs on same DN:
> CORRUPT_CHUNK, checksum updated in opposite direction.
> # Optionally poll {{ozone admin container reconcile <id> --status}} and
> observe replicasMatch=false and lingering checksum divergence on one replica.
> h2. Expected behavior
> After a successful reconcile reporting corrupt chunks repaired and a stable
> data checksum aligned with peers, background data scan should not report the
> same chunk as corrupt and should not revert the container data checksum
> unless there is a documented second source of truth.
> h2. Actual behavior
> Reconcile reports DONE and checksum aligned to peers;
> BackgroundContainerDataScanner then reports CORRUPT_CHUNK and updates data
> checksum away from the post-reconcile value
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]