[ 
https://issues.apache.org/jira/browse/HDDS-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-8831:
---------------------------------
    Labels: pull-request-available  (was: )

> UnsupportedOperationException when there are more replication tasks than limit
> ------------------------------------------------------------------------------
>
>                 Key: HDDS-8831
>                 URL: https://issues.apache.org/jira/browse/HDDS-8831
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: SCM
>            Reporter: Varsha Ravi
>            Assignee: Attila Doroszlai
>            Priority: Blocker
>              Labels: pull-request-available
>
> There is an UnsupportedOperationException when there are more re-replication 
> tasks than the hdds.scm.replication.datanode.replication.limit value. 
> In case of EC reconstruction tasks if the 
> hdds.scm.replication.datanode.replication.limit is set to 2 then the 
> reconstuction never completes and the container remains under-replicated if 
> few of the DNs are down. (This is because the reconstruction weight of EC is 
> 3 which is higher than the limit 2)
> In case of RATIS, or if the limit is 3 or more in case of EC, the replication 
> tasks complete without issues in the subsequent iterations of the processor.
>  
> {noformat}
> 2023-06-12 11:28:37,912 [Under Replicated Processor] ERROR 
> org.apache.hadoop.hdds.scm.container.replication.UnhealthyReplicationProcessor:
>  Error processing Health result of class: class 
> org.apache.hadoop.hdds.scm.container.replication.ContainerHealthResult$UnderReplicatedHealthResult
>  for container ContainerInfo{id=#2003, state=CLOSED, 
> stateEnterTime=2023-06-12T11:01:28.843Z, 
> pipelineID=PipelineID=e8fa71c9-7f9a-4b6f-a4ca-8cb01d78a646, owner=om2}
> java.lang.UnsupportedOperationException
>     at com.google.common.collect.ImmutableList.set(ImmutableList.java:528)
>     at 
> org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.validateDatanodes(SCMCommonPlacementPolicy.java:162)
>     at 
> org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:208)
>     at 
> org.apache.hadoop.hdds.scm.container.replication.ReplicationManagerUtil.getTargetDatanodes(ReplicationManagerUtil.java:83)
>     at 
> org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.getTargetDatanodes(ECUnderReplicationHandler.java:396)
>     at 
> org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.processMissingIndexes(ECUnderReplicationHandler.java:307)
>     at 
> org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.processAndSendCommands(ECUnderReplicationHandler.java:161)
>     at 
> org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.processUnderReplicatedContainer(ReplicationManager.java:769)
>     at 
> org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.sendDatanodeCommands(UnderReplicatedProcessor.java:58)
>     at 
> org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.sendDatanodeCommands(UnderReplicatedProcessor.java:27)
>     at 
> org.apache.hadoop.hdds.scm.container.replication.UnhealthyReplicationProcessor.processContainer(UnhealthyReplicationProcessor.java:148)
>     at 
> org.apache.hadoop.hdds.scm.container.replication.UnhealthyReplicationProcessor.processAll(UnhealthyReplicationProcessor.java:115)
>     at 
> org.apache.hadoop.hdds.scm.container.replication.UnhealthyReplicationProcessor.run(UnhealthyReplicationProcessor.java:157)
>     at java.base/java.lang.Thread.run(Thread.java:834){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to