[
https://issues.apache.org/jira/browse/HDDS-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
GuoHao updated HDDS-9281:
-------------------------
Attachment: image-2023-09-14-16-50-00-841.png
> The DatanodeCommand sent in LegacyReplicationManager does not set the deadline
> ------------------------------------------------------------------------------
>
> Key: HDDS-9281
> URL: https://issues.apache.org/jira/browse/HDDS-9281
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: GuoHao
> Priority: Major
> Attachments: image-2023-09-14-16-50-00-841.png
>
>
> Description:
> When a host is shut down and the node deadline is exceeded, the state of the
> node is changed to DEAD, and the replication manager of the SCM schedules the
> replenishment of the replica of the container that the node is responsible
> for.
> When the machine has more data and fewer nodes, other datanode nodes will
> receive too many replication tasks and queue them for execution. When the
> datanode is restarted during the execution process and reports the container
> it is responsible for to the SCM, these replication tasks will still be
> executed in the queue.
> I know that each datanode command has a deadline, and I don't see it set in
> LegacyReplicationManager; it's set in ReplicationManager, and if
> LegacyReplicationManager is also set If LegacyReplicationManager also sets
> deadline, then the replication task on datanode will not be executed when it
> reaches deadline.
>
> ReplicationManager code see:
> #
> org.apache.hadoop.hdds.scm.container.replication.ReplicationManager#sendDatanodeCommand
> {code:java}
> public void sendDatanodeCommand(SCMCommand<?> command,
> ContainerInfo containerInfo, DatanodeDetails target)
> throws NotLeaderException {
> long scmDeadline = clock.millis() + rmConf.eventTimeout;
> sendDatanodeCommand(command, containerInfo, target, scmDeadline);
> } {code}
>
> LegacyReplicationManager code see:
> #
> org.apache.hadoop.hdds.scm.container.replication.LegacyReplicationManager#sendAndTrackDatanodeCommand
>
> {code:java}
> private <T extends Message> boolean sendAndTrackDatanodeCommand(
> final DatanodeDetails datanode,
> final SCMCommand<T> command,
> final Predicate<InflightAction> tracker) {
> try {
> command.setTerm(scmContext.getTermOfLeader());
> } catch (NotLeaderException nle) {
> LOG.warn("Skip sending datanode command,"
> + " since current SCM is not leader.", nle);
> return false;
> }
> final boolean allowed = tracker.test(
> new InflightAction(datanode, clock.millis()));
> if (!allowed) {
> return false;
> }
> final CommandForDatanode<T> datanodeCommand =
> new CommandForDatanode<>(datanode.getUuid(), command);
> eventPublisher.fireEvent(SCMEvents.DATANODE_COMMAND, datanodeCommand);
> return true;
> } {code}
>
>
> {code:java}
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]