[
https://issues.apache.org/jira/browse/HDDS-11343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
JiangHua Zhu updated HDDS-11343:
--------------------------------
Description:
When S3G retries OM, the rpc sleep time is the product of
{ozone.client.wait.between.retries.millis} and
{ozone.client.failover.max.attempts}.
sleeping time =
{ozone.client.wait.between.retries.millis}*{ozone.client.failover.max.attempts}
When retry=1, sleep time = 2000ms.
When retry=2, sleep time = 4000ms.
......
Here are some data from online clusters:
We see that the sleep time reaches tens of seconds or even longer. Each
increase here is very expensive.
We can optimize as much as possible without changing the linear probability.
When choosing the sleep time, you can get a random value from the calculated
value.
was:
When S3G retries OM, the rpc sleep time is the product of
{ozone.client.wait.between.retries.millis} and
{ozone.client.failover.max.attempts}.
Retry time =
{ozone.client.wait.between.retries.millis}*{ozone.client.failover.max.attempts}
When retry=1, sleep time = 2000ms.
When retry=2, sleep time = 4000ms.
......
Here are some data from online clusters:
We see that the sleep time reaches tens of seconds or even longer. Each
increase here is very expensive.
We can optimize as much as possible without changing the linear probability.
When choosing the sleep time, you can get a random value from the calculated
value.
> S3G randomly selects retry time when retrying OM
> ------------------------------------------------
>
> Key: HDDS-11343
> URL: https://issues.apache.org/jira/browse/HDDS-11343
> Project: Apache Ozone
> Issue Type: Improvement
> Components: s3gateway
> Affects Versions: 1.4.0
> Reporter: JiangHua Zhu
> Assignee: JiangHua Zhu
> Priority: Major
>
> When S3G retries OM, the rpc sleep time is the product of
> {ozone.client.wait.between.retries.millis} and
> {ozone.client.failover.max.attempts}.
> sleeping time =
> {ozone.client.wait.between.retries.millis}*{ozone.client.failover.max.attempts}
> When retry=1, sleep time = 2000ms.
> When retry=2, sleep time = 4000ms.
> ......
> Here are some data from online clusters:
> We see that the sleep time reaches tens of seconds or even longer. Each
> increase here is very expensive.
> We can optimize as much as possible without changing the linear probability.
> When choosing the sleep time, you can get a random value from the calculated
> value.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]