[
https://issues.apache.org/jira/browse/HDDS-15261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sumit Agrawal resolved HDDS-15261.
----------------------------------
Fix Version/s: 2.2.0
Resolution: Fixed
> Unhealthy container never showed up in the container report in 30 minutes
> after corrupting chunks on all datanodes that hold replicas of that container
> -------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HDDS-15261
> URL: https://issues.apache.org/jira/browse/HDDS-15261
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Sarveksha Yeshavantha Raju
> Assignee: Sarveksha Yeshavantha Raju
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.2.0
>
>
> Steps to reproduce
> 1. Open a key write (associated to container id 1).
> 2. Identified the DN replicas for the container and corrupted/deleted a chunk
> of container on each data node.
> 3. Run container info to find the state of the replica, after around 15
> attempts it could find an unhealthy state of the replica.
> 4. Each Data node reported that the container is unhealty after 15 mins (at
> 2026-04-17 13:01:04,170 - can be seen from data node logs)
> {noformat}
> 2026-04-17 13:01:04,162 | WARN | ID=1 | Index=0 | BCSID=152 | State=CLOSED |
> Volume=/hadoop-ozone/datanode/data429418/hdds | DataChecksum=0 | Container
> data checksum updated from c15e627b to 0 |
> 2026-04-17 13:01:04,170 | ERROR | ID=1 | Index=0 | BCSID=152 |
> State=UNHEALTHY | Volume=/hadoop-ozone/datanode/data429418/hdds |
> DataChecksum=0 | Container has 1 error: MISSING_CHUNKS_DIR for file
> /hadoop-ozone/datanode/data429418/hdds/CID-f637f2ff-4884-45d1-81d2-82b4a936cdfd/current/containerDir0/1/chunks
> with exception: java.io.FileNotFoundException: Chunks directory
> /hadoop-ozone/datanode/data429418/hdds/CID-f637f2ff-4884-45d1-81d2-82b4a936cdfd/current/containerDir0/1/chunks
> not found. |
> {noformat}
> 5. The container report is triggered expecting the UNHEALTHY container count
> to be incremented, SCM could not get it even after 30 mins.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]