[
https://issues.apache.org/jira/browse/HDDS-8172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephen O'Donnell updated HDDS-8172:
------------------------------------
Description:
For an EC container which has 2 replicas for the same index, with one
decommissioning and one in_maintenance, the decommission logic in
ECUnderReplicationHandler can send a command for the replica, and then the
maintenance logic can send another replication command for the same container
to a different target. If they both succeed it will likely result in over
replication.
To solve this, we probably need to adjust the pending ops between each stage of
the processing, so as then the maintenance logic would be "fixed by pending"
and avoid sending the second command.
was:
Duplicate Replication Commands[replicateContainerCommand] are being sent by SCM
for the same container
{code:java}
2023-03-15 04:30:01,642 INFO
org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending
command [replicateContainerCommand: containerId: 11001, replicaIndex: 5,
sourceNodes:
[56f83447-2cec-4137-8b82-15ee1bc200a9(host-6.host.root.hwx.site/172.27.xxx.xxx)]]
for container ContainerInfo{id=#11001, state=CLOSED,
pipelineID=PipelineID=1fab690a-2176-4fb0-a0e8-4243f57af4fd,
stateEnterTime=2023-03-15T04:09:54.255Z, owner=om2} to
dfcc61cb-deed-453e-8a8d-c34bb73a4ada(host-1.host.root.hwx.site/172.27.xx.xxx)
2023-03-15 04:30:01,642 INFO
org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending
command [replicateContainerCommand: containerId: 11001, replicaIndex: 5,
sourceNodes:
[56f83447-2cec-4137-8b82-15ee1bc200a9(hostname-6.hostname.root.hwx.site/172.27.xxx.xx)]]
for container ContainerInfo{id=#11001, state=CLOSED,
pipelineID=PipelineID=1fab690a-2176-4fb0-a0e8-4243f57af4fd,
stateEnterTime=2023-03-15T04:09:54.255Z, owner=om2} to
1b834c42-7a2e-4154-93d4-b8391893d000(hostname-9.hostname.root.hwx.site/172.27.xxx.xx)
{code}
> Duplicate replicateContainerCommand Being Sent by SCM
> -----------------------------------------------------
>
> Key: HDDS-8172
> URL: https://issues.apache.org/jira/browse/HDDS-8172
> Project: Apache Ozone
> Issue Type: Bug
> Components: SCM
> Reporter: Arun Sarin
> Priority: Major
>
> For an EC container which has 2 replicas for the same index, with one
> decommissioning and one in_maintenance, the decommission logic in
> ECUnderReplicationHandler can send a command for the replica, and then the
> maintenance logic can send another replication command for the same container
> to a different target. If they both succeed it will likely result in over
> replication.
> To solve this, we probably need to adjust the pending ops between each stage
> of the processing, so as then the maintenance logic would be "fixed by
> pending" and avoid sending the second command.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]