[ 
https://issues.apache.org/jira/browse/HDDS-13067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Andika updated HDDS-13067:
-------------------------------
    Target Version/s: 2.1.0, 1.4.2, 2.0.1  (was: 2.1.0, 2.0.1)

> Container Balancer delete commands are sent with an expiration time in the 
> past
> -------------------------------------------------------------------------------
>
>                 Key: HDDS-13067
>                 URL: https://issues.apache.org/jira/browse/HDDS-13067
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM
>    Affects Versions: 1.4.1
>            Reporter: Siddhant Sangwan
>            Assignee: Tejaskriya
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.1.0
>
>
> h2. Problem
> This is the method that sends the delete command in MoveManager:
> {code:java}
>   private void sendDeleteCommand(
>       final ContainerInfo containerInfo, final DatanodeDetails datanode)
>       throws ContainerReplicaNotFoundException, ContainerNotFoundException,
>       NotLeaderException {
>     int replicaIndex = getContainerReplicaIndex(
>         containerInfo.containerID(), datanode);
>     long deleteTimeout = moveTimeout - replicationTimeout;
>     long now = clock.millis();
>     replicationManager.sendDeleteCommand(
>         containerInfo, replicaIndex, datanode, true, now + deleteTimeout);
>   }
> {code}
> It calculates deleteTimeout as moveTimeout - replicationTimeout, and then 
> sends the delete command with an SCM expiration timestamp of current time + 
> deleteTimeout. This is wrong, the delete expiration timestamp should actually 
> be "The time at which the move was started + moveTimeout."
> This diagram can help with visualisation, the key is that move = replicate + 
> delete.
> {code:java}
> /A/------------------------------------------------/B/-----------/C/
> {code}
> A = move start time
> B = move start time + replication timeout 
> C = move start time + move timeout
> The time duration that replicate command gets is replicationTimeout, and the 
> time duration that the total move gets is moveTimeout.
> So, the timestamp at which replicate command should expire is moveStart + 
> replicationTimeout (which is correct in the code). And the time at which the 
> delete should expire is moveStart + moveTimeout (this correction needs to be 
> done in the code).
> This bug is causing the delete expiration timestamp to be in the past (in the 
> Datanode) because Replication Manager (via which the command is actually 
> sent) further reduces the Datanode side expiration timestamp by 
> event.timeout.datanode.offset. So whenever moveTimeout - replicationTimeout < 
> event.timeout.datanode.offset, the expiration time in the DN is in the past.
> h2.Example and Repro
> For example, consider the following configs:
> hdds.container.balancer.move.replication.timeout=50m, 
> hdds.container.balancer.move.timeout=55m,
> hdds.scm.replication.event.timeout.datanode.offset=6m.
> MoveManager#sendDeleteCommand calls ReplicationManager#sendDeleteCommand with 
> SCM expiration timestamp of now + moveTimeout - moveReplicationTimeout, which 
> is now + 55 - 50, which is now + 5 minutes.
> The Replication Manager method further calls sendDatanodeCommand, which 
> calculates the Datanode expiration timestamp as
> {code:java}
> datanodeDeadline =
>         scmDeadlineEpochMs - rmConf.getDatanodeTimeoutOffset()
> {code}
> which translates to now + 5 minutes - 6 minutes, which is in the past.
> We need to further ensure the balancer configurations are not allowed to be 
> configured like this, which can be handled in another Jira - 
> https://issues.apache.org/jira/browse/HDDS-13068.
> h2. Solution
> For this jira, a simple fix is to keep the time when the move is scheduled in 
> MoveManager#pendingMoves map, then use that time to calculate the delete 
> timestamp when sending the delete command.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to