[ 
https://issues.apache.org/jira/browse/HDDS-12135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDDS-12135:
-------------------------------------
    Description: 
We recently found that delete commands can run for a long time once picked off 
the queue, and the default of a 10 minute deadline on SCM and 30 seconds less 
deadline on the datanodes can result in currently running commands being seen 
as expired in SCM.

This PR is to make the defaults less aggressive - giving a SCM / RM timeout of 
12 minutes and a datanode timeout of 6 minutes. That way, there is longer for 
commands to be processed before RM will resend them.

With the throttling that RM employs, there should not be a large number of 
commands on the queue anyway, as the goal of RM is to schedule only the number 
of commands which can be processed in a heartbeat or two.

Other related Jiras to this one are: HDDS-12127, HDDS-12115, HDDS-12114

  was:
We recently found that delete commands can run for a long time once picked off 
the queue, and the default of a 10 minute deadline on SCM and 30 seconds less 
deadline on the datanodes can result in currently running commands being seen 
as expired in SCM.

This PR is to make the defaults less aggressive - giving a SCM / RM timeout of 
12 minutes and a datanode timeout of 6 minutes. That way, there is longer for 
commands to be processed before RM will resend them.

With the throttling that RM employs, there should not be a large number of 
commands on the queue anyway, as the goal of RM is to schedule only the number 
of commands which can be processed in a heartbeat or two.


> Set RM default deadline to 12 minutes and the datanode offset to 6 minutes
> --------------------------------------------------------------------------
>
>                 Key: HDDS-12135
>                 URL: https://issues.apache.org/jira/browse/HDDS-12135
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: SCM
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>
> We recently found that delete commands can run for a long time once picked 
> off the queue, and the default of a 10 minute deadline on SCM and 30 seconds 
> less deadline on the datanodes can result in currently running commands being 
> seen as expired in SCM.
> This PR is to make the defaults less aggressive - giving a SCM / RM timeout 
> of 12 minutes and a datanode timeout of 6 minutes. That way, there is longer 
> for commands to be processed before RM will resend them.
> With the throttling that RM employs, there should not be a large number of 
> commands on the queue anyway, as the goal of RM is to schedule only the 
> number of commands which can be processed in a heartbeat or two.
> Other related Jiras to this one are: HDDS-12127, HDDS-12115, HDDS-12114



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to