[
https://issues.apache.org/jira/browse/HDDS-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Siddhant Sangwan reassigned HDDS-9011:
--------------------------------------
Assignee: Siddhant Sangwan
> [scanner-ec] Unhealthy replica not replaced with a healthy replica in a
> rack-scatter environment
> ------------------------------------------------------------------------------------------------
>
> Key: HDDS-9011
> URL: https://issues.apache.org/jira/browse/HDDS-9011
> Project: Apache Ozone
> Issue Type: Bug
> Components: SCM
> Reporter: Jyotirmoy Sinha
> Assignee: Siddhant Sangwan
> Priority: Major
>
> Steps -
> * Create volume/bucket/key
> * Close container of above key and simulate unhealthy replica in one of the
> datanodes
> * The unhealthy replica is not replaced by a healthy replica in rack scatter
> environment
> Error stacktrace from ozone-scm.log -
> {code:java}
> 2023-07-01 14:21:32,680 [Under Replicated Processor] WARN
> org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackScatter:
> Placement policy could not choose the enough nodes from available racks.
> Chosen nodes size from Unique Racks: 0, but required nodes to choose from
> Unique Racks: 1 do not match. Available racks count: 1, Excluded nodes count:
> 2, UsedNodes count: 4
> 2023-07-01 14:21:32,681 [Under Replicated Processor] ERROR
> org.apache.hadoop.hdds.scm.container.replication.UnhealthyReplicationProcessor:
> Error processing Health result of class: class
> org.apache.hadoop.hdds.scm.container.replication.ContainerHealthResult$UnderReplicatedHealthResult
> for container ContainerInfo{id=#4001, state=CLOSED,
> stateEnterTime=2023-07-01T13:56:35.605330Z,
> pipelineID=PipelineID=8d38087e-4071-481a-90b4-070eb7b97349,
> owner=om1546336661}
> org.apache.hadoop.hdds.scm.exceptions.SCMException: Placement Policy: class
> org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackScatter
> did not return any nodes. Number of required Nodes 1, Datasize Required:
> 1073741824
> at
> org.apache.hadoop.hdds.scm.container.replication.ReplicationManagerUtil.getTargetDatanodes(ReplicationManagerUtil.java:93)
> at
> org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.getTargetDatanodes(ECUnderReplicationHandler.java:390)
> at
> org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.processMissingIndexes(ECUnderReplicationHandler.java:301)
> at
> org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.processAndSendCommands(ECUnderReplicationHandler.java:160)
> at
> org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.processUnderReplicatedContainer(ReplicationManager.java:773)
> at
> org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.sendDatanodeCommands(UnderReplicatedProcessor.java:58)
> at
> org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.sendDatanodeCommands(UnderReplicatedProcessor.java:27)
> at
> org.apache.hadoop.hdds.scm.container.replication.UnhealthyReplicationProcessor.processContainer(UnhealthyReplicationProcessor.java:152)
> at
> org.apache.hadoop.hdds.scm.container.replication.UnhealthyReplicationProcessor.processAll(UnhealthyReplicationProcessor.java:112)
> at
> org.apache.hadoop.hdds.scm.container.replication.UnhealthyReplicationProcessor.run(UnhealthyReplicationProcessor.java:161)
> at java.base/java.lang.Thread.run(Thread.java:834)
> 2023-07-01 14:21:32,681 [Under Replicated Processor] INFO
> org.apache.hadoop.hdds.scm.container.replication.UnhealthyReplicationProcessor:
> Processed 0 containers with health state counts {}, failed processing 2,
> deferred due to load 0
> 2023-07-01 14:22:02,682 [Under Replicated Processor] WARN
> org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackScatter:
> Placement policy cannot choose the enough racks. Required nodes size: 1 is
> less than required number of racks to choose: 2.Total number of Required
> Racks: 5 Used Racks Count: 3, Required Nodes count: 1
> 2023-07-01 14:22:02,682 [Under Replicated Processor] ERROR
> org.apache.hadoop.hdds.scm.container.replication.UnhealthyReplicationProcessor:
> Error processing Health result of class: class
> org.apache.hadoop.hdds.scm.container.replication.ContainerHealthResult$UnderReplicatedHealthResult
> for container ContainerInfo{id=#2003, state=CLOSED,
> stateEnterTime=2023-07-01T13:42:46.984Z,
> pipelineID=PipelineID=ba15da2d-e90f-4128-ac2b-c720eea22146,
> owner=om1546336657}{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]