[ https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiao Chen updated HADOOP-13590: ------------------------------- Attachment: HADOOP-13590.07.patch Thanks for the feedback, [~ste...@apache.org]. bq. is there any reason not to use a RetryPolicy here Good question! The reason is the following: First of all, we definitely want exponential backoff, to prevent us causing ddos on kdc. In {{RetryPolicies}}, there is no {{RetryUpToMaxmumTimeWithProportinalSleep}}, and IMO the reason lacking one there, is it's not feasible/maintainable to calculate a {{maxRetries}} inline when invoking the base class ctor. It's eventually calculating a taylor series IIUC. In our case, we could calculate the {{maxRetries}} beforehand, then initialize a {{retryUpToMaximumCountWithProportionalSleep}} accordingly. That ends up in similar code to {{getNextTgtRenewalTime}} in the caller. Moreover, personally I feel the last retry before expiry could be helpful, otherwise the backoff will likely miss the end time. bq. Test can probably import org.apache.hadoop.conf.Configuration rather than declare variables that way. Not really, there's a conflict with {{javax.security.auth.login.Configration}}. On a second thought I switched the two to make hadoop's {{Configuration}} the default. Other comments are addressed in patch 7. Regarding the test, having a real test is brittle and a bit time consuming (due to {{TICKET_RENEW_WINDOW}}), but having a fake test as [~drankye] pointed out is.... fake. I don't have a strong option, but if it ends up spamming pre-commit, we may switch to the mock test after all. > Retry until TGT expires even if the UGI renewal thread encountered exception > ---------------------------------------------------------------------------- > > Key: HADOOP-13590 > URL: https://issues.apache.org/jira/browse/HADOOP-13590 > Project: Hadoop Common > Issue Type: Improvement > Components: security > Affects Versions: 2.8.0, 2.7.3, 2.6.4 > Reporter: Xiao Chen > Assignee: Xiao Chen > Attachments: HADOOP-13590.01.patch, HADOOP-13590.02.patch, > HADOOP-13590.03.patch, HADOOP-13590.04.patch, HADOOP-13590.05.patch, > HADOOP-13590.06.patch, HADOOP-13590.07.patch > > > The UGI has a background thread to renew the tgt. On exception, it > [terminates > itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014] > If something temporarily goes wrong that results in an IOE, even if it > recovered no renewal will be done and client will eventually fail to > authenticate. We should retry with our best effort, until tgt expires, in the > hope that the error recovers before that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org