Yashaswini G A created HDDS-14936:
-------------------------------------

             Summary: Container data checksum reverted by 
BackgroundContainerDataScanner after successful reconciliation
                 Key: HDDS-14936
                 URL: https://issues.apache.org/jira/browse/HDDS-14936
             Project: Apache Ozone
          Issue Type: Bug
          Components: Ozone Datanode
            Reporter: Yashaswini G A


After corrupting all three Ratis replicas of a CLOSED container on different 
block files {{ozone admin container reconcile }} was run. Reconciliation 
completed on a datanode with peers, reporting corrupt chunks repaired and 
updated data checksum ( 76950a80 -> 914f24e4). Shortly afterward, 
BackgroundContainerDataScanner on the same datanode logged CORRUPT_CHUNK for a 
.block file, OzoneChecksumException on read, and updated the container data 
checksum again ( 914f24e4 -> 76950a80). Later scans again flipped the checksum. 
{{ozone admin container reconcile --status}} showed replicasMatch=false with 
one replica still on the older checksum.


h2. Steps to reproduce (high level)
 # Close container; note per-replica dataChecksum (three-way mismatch after 
corruption).

 # Run {{{}ozone admin container reconcile <containerID>{}}}.

 # Observe DN logs: ReconcileContainerTask / KeyValueHandler reports successful 
reconcile and checksum update.

 # Within seconds/minutes, observe ContainerDataScanner logs on same DN: 
CORRUPT_CHUNK, checksum updated in opposite direction.

 # Optionally poll {{ozone admin container reconcile <id> --status}} and 
observe replicasMatch=false and lingering checksum divergence on one replica.


h2. Expected behavior

After a successful reconcile reporting corrupt chunks repaired and a stable 
data checksum aligned with peers, background data scan should not report the 
same chunk as corrupt and should not revert the container data checksum unless 
there is a documented second source of truth.
h2. Actual behavior

Reconcile reports DONE and checksum aligned to peers; 
BackgroundContainerDataScanner then reports CORRUPT_CHUNK and updates data 
checksum away from the post-reconcile value



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to