[ 
https://issues.apache.org/jira/browse/HDDS-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yashaswini G A updated HDDS-14936:
----------------------------------
    Description: 
After corrupting all three Ratis replicas of a CLOSED container on different 
block files {{ozone admin container reconcile }} was run. Reconciliation 
completed on a datanode with peers, reporting corrupt chunks repaired and 
updated data checksum ( 76950a80 -> 914f24e4). Shortly afterward, 
BackgroundContainerDataScanner on the same datanode logged CORRUPT_CHUNK for a 
.block file, OzoneChecksumException on read, and updated the container data 
checksum again ( 914f24e4 -> 76950a80). Later scans again flipped the checksum. 
{{ozone admin container reconcile --status}} showed replicasMatch=false with 
one replica still on the older checksum.
h2. Steps to reproduce 
 # Close container; note per-replica dataChecksum (three-way mismatch after 
corruption).
 # Run {{{}ozone admin container reconcile <containerID>{}}}.
 # Observe DN logs: ReconcileContainerTask / KeyValueHandler reports successful 
reconcile and checksum update.
 # Within seconds/minutes, observe ContainerDataScanner logs on same DN: 
CORRUPT_CHUNK, checksum updated in opposite direction.
 # Optionally poll {{ozone admin container reconcile <id> --status}} and 
observe replicasMatch=false and lingering checksum divergence on one replica.

h2. Expected behavior

After a successful reconcile reporting corrupt chunks repaired and a stable 
data checksum aligned with peers, background data scan should not report the 
same chunk as corrupt and should not revert the container data checksum unless 
there is a documented second source of truth.
h2. Actual behavior

Reconcile reports DONE and checksum aligned to peers; 
BackgroundContainerDataScanner then reports CORRUPT_CHUNK and updates data 
checksum away from the post-reconcile value

  was:
After corrupting all three Ratis replicas of a CLOSED container on different 
block files {{ozone admin container reconcile }} was run. Reconciliation 
completed on a datanode with peers, reporting corrupt chunks repaired and 
updated data checksum ( 76950a80 -> 914f24e4). Shortly afterward, 
BackgroundContainerDataScanner on the same datanode logged CORRUPT_CHUNK for a 
.block file, OzoneChecksumException on read, and updated the container data 
checksum again ( 914f24e4 -> 76950a80). Later scans again flipped the checksum. 
{{ozone admin container reconcile --status}} showed replicasMatch=false with 
one replica still on the older checksum.


h2. Steps to reproduce (high level)
 # Close container; note per-replica dataChecksum (three-way mismatch after 
corruption).

 # Run {{{}ozone admin container reconcile <containerID>{}}}.

 # Observe DN logs: ReconcileContainerTask / KeyValueHandler reports successful 
reconcile and checksum update.

 # Within seconds/minutes, observe ContainerDataScanner logs on same DN: 
CORRUPT_CHUNK, checksum updated in opposite direction.

 # Optionally poll {{ozone admin container reconcile <id> --status}} and 
observe replicasMatch=false and lingering checksum divergence on one replica.


h2. Expected behavior

After a successful reconcile reporting corrupt chunks repaired and a stable 
data checksum aligned with peers, background data scan should not report the 
same chunk as corrupt and should not revert the container data checksum unless 
there is a documented second source of truth.
h2. Actual behavior

Reconcile reports DONE and checksum aligned to peers; 
BackgroundContainerDataScanner then reports CORRUPT_CHUNK and updates data 
checksum away from the post-reconcile value


> Container data checksum reverted by BackgroundContainerDataScanner after 
> successful reconciliation
> --------------------------------------------------------------------------------------------------
>
>                 Key: HDDS-14936
>                 URL: https://issues.apache.org/jira/browse/HDDS-14936
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Datanode
>            Reporter: Yashaswini G A
>            Priority: Major
>
> After corrupting all three Ratis replicas of a CLOSED container on different 
> block files {{ozone admin container reconcile }} was run. Reconciliation 
> completed on a datanode with peers, reporting corrupt chunks repaired and 
> updated data checksum ( 76950a80 -> 914f24e4). Shortly afterward, 
> BackgroundContainerDataScanner on the same datanode logged CORRUPT_CHUNK for 
> a .block file, OzoneChecksumException on read, and updated the container data 
> checksum again ( 914f24e4 -> 76950a80). Later scans again flipped the 
> checksum. {{ozone admin container reconcile --status}} showed 
> replicasMatch=false with one replica still on the older checksum.
> h2. Steps to reproduce 
>  # Close container; note per-replica dataChecksum (three-way mismatch after 
> corruption).
>  # Run {{{}ozone admin container reconcile <containerID>{}}}.
>  # Observe DN logs: ReconcileContainerTask / KeyValueHandler reports 
> successful reconcile and checksum update.
>  # Within seconds/minutes, observe ContainerDataScanner logs on same DN: 
> CORRUPT_CHUNK, checksum updated in opposite direction.
>  # Optionally poll {{ozone admin container reconcile <id> --status}} and 
> observe replicasMatch=false and lingering checksum divergence on one replica.
> h2. Expected behavior
> After a successful reconcile reporting corrupt chunks repaired and a stable 
> data checksum aligned with peers, background data scan should not report the 
> same chunk as corrupt and should not revert the container data checksum 
> unless there is a documented second source of truth.
> h2. Actual behavior
> Reconcile reports DONE and checksum aligned to peers; 
> BackgroundContainerDataScanner then reports CORRUPT_CHUNK and updates data 
> checksum away from the post-reconcile value



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to