Sarveksha Yeshavantha Raju created HDDS-15261:
-------------------------------------------------
Summary: Unhealthy container never showed up in the container
report in 30 minutes after corrupting chunks on all datanodes that hold
replicas of that container
Key: HDDS-15261
URL: https://issues.apache.org/jira/browse/HDDS-15261
Project: Apache Ozone
Issue Type: Bug
Reporter: Sarveksha Yeshavantha Raju
Assignee: Sarveksha Yeshavantha Raju
Steps to reproduce
1. Open a key write (associated to container id 1).
2. Identified the DN replicas for the container and corrupted/deleted a chunk
of container on each data node.
3. Run container info to find the state of the replica, after around 15
attempts it could find an unhealthy state of the replica.
4. Each Data node reported that the container is unhealty after 15 mins (at
2026-04-17 13:01:04,170 - can be seen from data node logs)
{noformat}
2026-04-17 13:01:04,162 | WARN | ID=1 | Index=0 | BCSID=152 | State=CLOSED |
Volume=/hadoop-ozone/datanode/data429418/hdds | DataChecksum=0 | Container data
checksum updated from c15e627b to 0 |
2026-04-17 13:01:04,170 | ERROR | ID=1 | Index=0 | BCSID=152 | State=UNHEALTHY
| Volume=/hadoop-ozone/datanode/data429418/hdds | DataChecksum=0 | Container
has 1 error: MISSING_CHUNKS_DIR for file
/hadoop-ozone/datanode/data429418/hdds/CID-f637f2ff-4884-45d1-81d2-82b4a936cdfd/current/containerDir0/1/chunks
with exception: java.io.FileNotFoundException: Chunks directory
/hadoop-ozone/datanode/data429418/hdds/CID-f637f2ff-4884-45d1-81d2-82b4a936cdfd/current/containerDir0/1/chunks
not found. |
{noformat}
5. The container report is triggered expecting the UNHEALTHY container count to
be incremented, SCM could not get it even after 30 mins.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]