[
https://issues.apache.org/jira/browse/HDDS-13067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Andika updated HDDS-13067:
-------------------------------
Target Version/s: 2.1.0, 1.4.2, 2.0.1 (was: 2.1.0, 2.0.1)
> Container Balancer delete commands are sent with an expiration time in the
> past
> -------------------------------------------------------------------------------
>
> Key: HDDS-13067
> URL: https://issues.apache.org/jira/browse/HDDS-13067
> Project: Apache Ozone
> Issue Type: Bug
> Components: SCM
> Affects Versions: 1.4.1
> Reporter: Siddhant Sangwan
> Assignee: Tejaskriya
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.1.0
>
>
> h2. Problem
> This is the method that sends the delete command in MoveManager:
> {code:java}
> private void sendDeleteCommand(
> final ContainerInfo containerInfo, final DatanodeDetails datanode)
> throws ContainerReplicaNotFoundException, ContainerNotFoundException,
> NotLeaderException {
> int replicaIndex = getContainerReplicaIndex(
> containerInfo.containerID(), datanode);
> long deleteTimeout = moveTimeout - replicationTimeout;
> long now = clock.millis();
> replicationManager.sendDeleteCommand(
> containerInfo, replicaIndex, datanode, true, now + deleteTimeout);
> }
> {code}
> It calculates deleteTimeout as moveTimeout - replicationTimeout, and then
> sends the delete command with an SCM expiration timestamp of current time +
> deleteTimeout. This is wrong, the delete expiration timestamp should actually
> be "The time at which the move was started + moveTimeout."
> This diagram can help with visualisation, the key is that move = replicate +
> delete.
> {code:java}
> /A/------------------------------------------------/B/-----------/C/
> {code}
> A = move start time
> B = move start time + replication timeout
> C = move start time + move timeout
> The time duration that replicate command gets is replicationTimeout, and the
> time duration that the total move gets is moveTimeout.
> So, the timestamp at which replicate command should expire is moveStart +
> replicationTimeout (which is correct in the code). And the time at which the
> delete should expire is moveStart + moveTimeout (this correction needs to be
> done in the code).
> This bug is causing the delete expiration timestamp to be in the past (in the
> Datanode) because Replication Manager (via which the command is actually
> sent) further reduces the Datanode side expiration timestamp by
> event.timeout.datanode.offset. So whenever moveTimeout - replicationTimeout <
> event.timeout.datanode.offset, the expiration time in the DN is in the past.
> h2.Example and Repro
> For example, consider the following configs:
> hdds.container.balancer.move.replication.timeout=50m,
> hdds.container.balancer.move.timeout=55m,
> hdds.scm.replication.event.timeout.datanode.offset=6m.
> MoveManager#sendDeleteCommand calls ReplicationManager#sendDeleteCommand with
> SCM expiration timestamp of now + moveTimeout - moveReplicationTimeout, which
> is now + 55 - 50, which is now + 5 minutes.
> The Replication Manager method further calls sendDatanodeCommand, which
> calculates the Datanode expiration timestamp as
> {code:java}
> datanodeDeadline =
> scmDeadlineEpochMs - rmConf.getDatanodeTimeoutOffset()
> {code}
> which translates to now + 5 minutes - 6 minutes, which is in the past.
> We need to further ensure the balancer configurations are not allowed to be
> configured like this, which can be handled in another Jira -
> https://issues.apache.org/jira/browse/HDDS-13068.
> h2. Solution
> For this jira, a simple fix is to keep the time when the move is scheduled in
> MoveManager#pendingMoves map, then use that time to calculate the delete
> timestamp when sending the delete command.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]