[
https://issues.apache.org/jira/browse/HDDS-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838920#comment-16838920
]
Hrishikesh Gadre commented on HDDS-1201:
----------------------------------------
h2. *_Technical Description: Details on the technical approach planned_*
High level workflow
* The data-node will compute the list of corrupted containers as a background
activity.
* This list of corrupted container ids will be shared with SCM as part of the
next heartbeat message.
* The SCM will process this list and mark the corresponding replica as
corrupted.
* Currently the state of the container replicas is stored in-memory only (and
not persisted to disk). This feature does not change that model. That means if
the SCM crashes and comes back again, it will lose the knowledge of corrupted
containers and will need to be rebuilt over a period of time.
* The SCM will provide metrics about the corrupted container replicas via JMX
API
Out-of-scope work items
* Ability to take corrective action (e.g. schedule container replication) when
a corrupted replica is reported.
*Leverage Incremental Container Report functionality*
DataNode changes:
* Ensure that the Data scrubbing framework in DataNode should mark container
as unhealthy and send the ICR as part of that step.
SCM changes:
* SCM should filter the unhealthy replicas when a client requests for replicas
for a given container.
* Add an API in SCMMXBean to get an aggregated count of corrupted container
replicas (along with the concrete implementation in StorageContainerManager.
> Reporting Corruptions in Containers to SCM
> ------------------------------------------
>
> Key: HDDS-1201
> URL: https://issues.apache.org/jira/browse/HDDS-1201
> Project: Hadoop Distributed Data Store
> Issue Type: Sub-task
> Components: Ozone Datanode, SCM
> Reporter: Supratim Deka
> Assignee: Hrishikesh Gadre
> Priority: Major
>
> Add protocol message and handling to report container corruptions to the SCM.
> Also add basic recovery handling in SCM.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]