[
https://issues.apache.org/jira/browse/HADOOP-14828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16153829#comment-16153829
]
Jason Lowe commented on HADOOP-14828:
-------------------------------------
This looks like a duplicate of HADOOP-11398.
> RetryUpToMaximumTimeWithFixedSleep is not bounded by maximum time
> -----------------------------------------------------------------
>
> Key: HADOOP-14828
> URL: https://issues.apache.org/jira/browse/HADOOP-14828
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Jonathan Hung
>
> In RetryPolicies.java, RetryUpToMaximumTimeWithFixedSleep is converted to a
> RetryUpToMaximumCountWithFixedSleep, whose count is the maxTime / sleepTime:
> {noformat} public RetryUpToMaximumTimeWithFixedSleep(long maxTime, long
> sleepTime,
> TimeUnit timeUnit) {
> super((int) (maxTime / sleepTime), sleepTime, timeUnit);
> this.maxTime = maxTime;
> this.timeUnit = timeUnit;
> }
> {noformat}
> But if retries take a long time, then the maxTime passed to the
> RetryUpToMaximumTimeWithFixedSleep is exceeded.
> As an example, while doing NM restarts, we saw an issue where the NMProxy
> creates a retry policy which specifies a maximum wait time of 15 minutes and
> a 10 sec interval (which is converted to a MaximumCount policy with 15 min /
> 10 sec = 90 tries). But each NMProxy retry policy invokes o.a.h.ipc.Client's
> retry policy: {noformat} if (connectionRetryPolicy == null) {
> final int max = conf.getInt(
> CommonConfigurationKeysPublic.IPC_CLIENT_CONNECT_MAX_RETRIES_KEY,
>
> CommonConfigurationKeysPublic.IPC_CLIENT_CONNECT_MAX_RETRIES_DEFAULT);
> final int retryInterval = conf.getInt(
>
> CommonConfigurationKeysPublic.IPC_CLIENT_CONNECT_RETRY_INTERVAL_KEY,
> CommonConfigurationKeysPublic
> .IPC_CLIENT_CONNECT_RETRY_INTERVAL_DEFAULT);
> connectionRetryPolicy =
> RetryPolicies.retryUpToMaximumCountWithFixedSleep(
> max, retryInterval, TimeUnit.MILLISECONDS);
> }{noformat}
> So the time it takes the NMProxy to fail is actually (90 retries) * (10 sec
> NMProxy interval + o.a.h.ipc.Client retry time). In the default case, ipc
> client retries 10 times with a 1 sec interval, meaning the time it takes for
> NMProxy to fail is (90)(10 sec + 10 sec) = 30 min instead of the 15 min
> specified by NMProxy configuration.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]