[
https://issues.apache.org/jira/browse/HDDS-12135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephen O'Donnell updated HDDS-12135:
-------------------------------------
Description:
We recently found that delete commands can run for a long time once picked off
the queue, and the default of a 10 minute deadline on SCM and 30 seconds less
deadline on the datanodes can result in currently running commands being seen
as expired in SCM.
This PR is to make the defaults less aggressive - giving a SCM / RM timeout of
12 minutes and a datanode timeout of 6 minutes. That way, there is longer for
commands to be processed before RM will resend them.
With the throttling that RM employs, there should not be a large number of
commands on the queue anyway, as the goal of RM is to schedule only the number
of commands which can be processed in a heartbeat or two.
Other related Jiras to this one are: HDDS-12127, HDDS-12115, HDDS-12114
was:
We recently found that delete commands can run for a long time once picked off
the queue, and the default of a 10 minute deadline on SCM and 30 seconds less
deadline on the datanodes can result in currently running commands being seen
as expired in SCM.
This PR is to make the defaults less aggressive - giving a SCM / RM timeout of
12 minutes and a datanode timeout of 6 minutes. That way, there is longer for
commands to be processed before RM will resend them.
With the throttling that RM employs, there should not be a large number of
commands on the queue anyway, as the goal of RM is to schedule only the number
of commands which can be processed in a heartbeat or two.
> Set RM default deadline to 12 minutes and the datanode offset to 6 minutes
> --------------------------------------------------------------------------
>
> Key: HDDS-12135
> URL: https://issues.apache.org/jira/browse/HDDS-12135
> Project: Apache Ozone
> Issue Type: Improvement
> Components: SCM
> Reporter: Stephen O'Donnell
> Assignee: Stephen O'Donnell
> Priority: Major
>
> We recently found that delete commands can run for a long time once picked
> off the queue, and the default of a 10 minute deadline on SCM and 30 seconds
> less deadline on the datanodes can result in currently running commands being
> seen as expired in SCM.
> This PR is to make the defaults less aggressive - giving a SCM / RM timeout
> of 12 minutes and a datanode timeout of 6 minutes. That way, there is longer
> for commands to be processed before RM will resend them.
> With the throttling that RM employs, there should not be a large number of
> commands on the queue anyway, as the goal of RM is to schedule only the
> number of commands which can be processed in a heartbeat or two.
> Other related Jiras to this one are: HDDS-12127, HDDS-12115, HDDS-12114
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]