[jira] [Updated] (HDDS-8172) Duplicate replicateContainerCommand Being Sent by SCM

Stephen O'Donnell (Jira) Wed, 15 Mar 2023 14:37:00 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-8172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stephen O'Donnell updated HDDS-8172:
------------------------------------
    Description: 
For an EC container which has 2 replicas for the same index, with one 
decommissioning and one in_maintenance, the decommission logic in 
ECUnderReplicationHandler can send a command for the replica, and then the 
maintenance logic can send another replication command for the same container 
to a different target. If they both succeed it will likely result in over 
replication.

To solve this, we probably need to adjust the pending ops between each stage of 
the processing, so as then the maintenance logic would be "fixed by pending" 
and avoid sending the second command.

  was:
Duplicate Replication Commands[replicateContainerCommand] are being sent by SCM 
for the same container

 
{code:java}
2023-03-15 04:30:01,642 INFO 
org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending 
command [replicateContainerCommand: containerId: 11001, replicaIndex: 5, 
sourceNodes: 
[56f83447-2cec-4137-8b82-15ee1bc200a9(host-6.host.root.hwx.site/172.27.xxx.xxx)]]
 for container ContainerInfo{id=#11001, state=CLOSED, 
pipelineID=PipelineID=1fab690a-2176-4fb0-a0e8-4243f57af4fd, 
stateEnterTime=2023-03-15T04:09:54.255Z, owner=om2} to 
dfcc61cb-deed-453e-8a8d-c34bb73a4ada(host-1.host.root.hwx.site/172.27.xx.xxx)

2023-03-15 04:30:01,642 INFO 
org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending 
command [replicateContainerCommand: containerId: 11001, replicaIndex: 5, 
sourceNodes: 
[56f83447-2cec-4137-8b82-15ee1bc200a9(hostname-6.hostname.root.hwx.site/172.27.xxx.xx)]]
 for container ContainerInfo{id=#11001, state=CLOSED, 
pipelineID=PipelineID=1fab690a-2176-4fb0-a0e8-4243f57af4fd, 
stateEnterTime=2023-03-15T04:09:54.255Z, owner=om2} to 
1b834c42-7a2e-4154-93d4-b8391893d000(hostname-9.hostname.root.hwx.site/172.27.xxx.xx)
 {code}
 

 


> Duplicate replicateContainerCommand Being Sent by SCM
> -----------------------------------------------------
>
>                 Key: HDDS-8172
>                 URL: https://issues.apache.org/jira/browse/HDDS-8172
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM
>            Reporter: Arun Sarin
>            Priority: Major
>
> For an EC container which has 2 replicas for the same index, with one 
> decommissioning and one in_maintenance, the decommission logic in 
> ECUnderReplicationHandler can send a command for the replica, and then the 
> maintenance logic can send another replication command for the same container 
> to a different target. If they both succeed it will likely result in over 
> replication.
> To solve this, we probably need to adjust the pending ops between each stage 
> of the processing, so as then the maintenance logic would be "fixed by 
> pending" and avoid sending the second command.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-8172) Duplicate replicateContainerCommand Being Sent by SCM

Reply via email to