[ 
https://issues.apache.org/jira/browse/HDDS-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai resolved HDDS-7292.
------------------------------------
    Resolution: Cannot Reproduce

> [Ozone EC] SCM went down with replica index mismatch in replicaSet
> ------------------------------------------------------------------
>
>                 Key: HDDS-7292
>                 URL: https://issues.apache.org/jira/browse/HDDS-7292
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: EC
>            Reporter: Nilotpal Nandi
>            Priority: Major
>
> steps taken :
> 1) Write tpc-ds hive dataset.
> 2) shutdown DNs for online reconstruction.
> 3) Initially queries were successful.
> 4) SCMs went down with java.lang.IllegalArgumentException: Replica Index in 
> replicaSet for containerID 4012must be between 1 and 5. But the given index 
> is: 0
> ozone-scm.log:
> {noformat}
> 2022-09-22 20:17:42,722 INFO 
> org.apache.hadoop.hdds.scm.container.replication.LegacyReplicationManager: 
> Sending delete container command for container #5070 to datanode 
> 507a0a6a-0745-49e7-9b42-c9bfd68eace8{ip: 172.27.14.11, host: 
> quasar-nkmwlh-1.quasar-nkmwlh.root.hwx.site, ports: [REPLICATION=9886, 
> RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], 
> networkLocation: /default, certSerialId: null, persistedOpState: IN_SERVICE, 
> persistedOpStateExpiryEpochSec: 0}
> 2022-09-22 20:17:42,723 INFO 
> org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: 
> Replication Monitor Thread took 5 milliseconds for processing 238 containers.
> 2022-09-22 20:17:54,645 INFO 
> org.apache.hadoop.hdds.scm.pipeline.PipelineReportHandler: Reported pipeline 
> PipelineID=50b59b4b-4e14-4e00-9051-eb4689467a11 is not found
> 2022-09-22 20:17:54,661 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
> Auth successful for 
> recon/[email protected] 
> (auth:KERBEROS)
> 2022-09-22 20:17:54,666 INFO 
> SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
>  Authorization successful for 
> recon/[email protected] 
> (auth:KERBEROS) for protocol=interface 
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocol
> 2022-09-22 20:17:54,667 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 35 on 9860, call Call#89 Retry#0 
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocol.submitRequest
>  from 172.27.136.69:45687
> org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: 
> PipelineID=50b59b4b-4e14-4e00-9051-eb4689467a11 not found
>         at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getPipeline(PipelineStateMap.java:157)
>         at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateManagerImpl.getPipeline(PipelineStateManagerImpl.java:137)
>         at jdk.internal.reflect.GeneratedMethodAccessor31.invoke(Unknown 
> Source)
>         at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>         at 
> org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeLocal(SCMHAInvocationHandler.java:87)
>         at 
> org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(SCMHAInvocationHandler.java:72)
>         at com.sun.proxy.$Proxy17.getPipeline(Unknown Source)
>         at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineManagerImpl.getPipeline(PipelineManagerImpl.java:272)
>         at 
> org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer.getPipeline(SCMClientProtocolServer.java:693)
>         at 
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.getPipeline(StorageContainerLocationProtocolServerSideTranslatorPB.java:860)
>         at 
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.processRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:581)
>         at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
>         at 
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.submitRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:213)
>         at 
> org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java:61195)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
>         at java.base/java.security.AccessController.doPrivileged(Native 
> Method)
>         at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)
> 2022-09-22 20:17:57,141 INFO 
> org.apache.hadoop.hdds.scm.container.replication.OverReplicatedProcessor: 
> Processed 0 over replicated containers, failed processing 0
> 2022-09-22 20:17:57,164 INFO 
> org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor: 
> Processed 0 under replicated containers, failed processing 0
> 2022-09-22 20:17:57,724 INFO 
> org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending 
> close container command for container #4012 to datanode 
> 323115a5-980c-42a6-be10-43f30ddec3f5{ip: 172.27.10.10, host: 
> quasar-nkmwlh-4.quasar-nkmwlh.root.hwx.site, ports: [REPLICATION=9886, 
> RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], 
> networkLocation: /default, certSerialId: null, persistedOpState: IN_SERVICE, 
> persistedOpStateExpiryEpochSec: 0}.
> 2022-09-22 20:17:57,729 ERROR 
> org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: 
> Exception in Replication Monitor Thread.
> java.lang.IllegalArgumentException: Replica Index in replicaSet for 
> containerID 4012must be between 1 and 5. But the given index is: 0
>         at 
> org.apache.hadoop.hdds.scm.container.replication.ECContainerReplicaCount.ensureIndexWithinBounds(ECContainerReplicaCount.java:426)
>         at 
> org.apache.hadoop.hdds.scm.container.replication.ECContainerReplicaCount.<init>(ECContainerReplicaCount.java:99)
>         at 
> org.apache.hadoop.hdds.scm.container.replication.ECContainerHealthCheck.checkHealth(ECContainerHealthCheck.java:50)
>         at 
> org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.processContainer(ReplicationManager.java:423)
>         at 
> org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.processAll(ReplicationManager.java:278)
>         at 
> org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.run(ReplicationManager.java:538)
>         at java.base/java.lang.Thread.run(Thread.java:834)
> 2022-09-22 20:17:57,731 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1: java.lang.IllegalArgumentException: Replica Index in replicaSet for 
> containerID 4012must be between 1 and 5. But the given index is: 0
> 2022-09-22 20:17:57,736 INFO 
> org.apache.hadoop.hdds.scm.server.StorageContainerManager: Container Balancer 
> is not running.
> 2022-09-22 20:17:57,736 INFO 
> org.apache.hadoop.hdds.scm.server.StorageContainerManager: Stopping 
> Replication Manager Service.
> 2022-09-22 20:17:57,736 INFO 
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter: 
> SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down StorageContainerManager at 
> quasar-nkmwlh-2.quasar-nkmwlh.root.hwx.site/172.27.136.69
> ************************************************************/
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to