[ 
https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-13590:
-------------------------------
    Attachment: HADOOP-13590.07.patch

Thanks for the feedback, [~ste...@apache.org].

bq. is there any reason not to use a RetryPolicy here
Good question! The reason is the following:
First of all, we definitely want exponential backoff, to prevent us causing 
ddos on kdc.

In {{RetryPolicies}}, there is no {{RetryUpToMaxmumTimeWithProportinalSleep}}, 
and IMO the reason lacking one there, is it's not feasible/maintainable to 
calculate a {{maxRetries}} inline when invoking the base class ctor. It's 
eventually calculating a taylor series IIUC.

In our case, we could calculate the {{maxRetries}} beforehand, then initialize 
a {{retryUpToMaximumCountWithProportionalSleep}} accordingly. That ends up in 
similar code to {{getNextTgtRenewalTime}} in the caller. Moreover, personally I 
feel the last retry before expiry could be helpful, otherwise the backoff will 
likely miss the end time.

bq. Test can probably import org.apache.hadoop.conf.Configuration rather than 
declare variables that way.
Not really, there's a conflict with {{javax.security.auth.login.Configration}}. 
On a second thought I switched the two to make hadoop's {{Configuration}} the 
default.


Other comments are addressed in patch 7.
Regarding the test, having a real test is brittle and a bit time consuming (due 
to {{TICKET_RENEW_WINDOW}}), but having a fake test as [~drankye] pointed out 
is.... fake. I don't have a strong option, but if it ends up spamming 
pre-commit, we may switch to the mock test after all.

> Retry until TGT expires even if the UGI renewal thread encountered exception
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-13590
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13590
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: security
>    Affects Versions: 2.8.0, 2.7.3, 2.6.4
>            Reporter: Xiao Chen
>            Assignee: Xiao Chen
>         Attachments: HADOOP-13590.01.patch, HADOOP-13590.02.patch, 
> HADOOP-13590.03.patch, HADOOP-13590.04.patch, HADOOP-13590.05.patch, 
> HADOOP-13590.06.patch, HADOOP-13590.07.patch
>
>
> The UGI has a background thread to renew the tgt. On exception, it 
> [terminates 
> itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it 
> recovered no renewal will be done and client will eventually fail to 
> authenticate. We should retry with our best effort, until tgt expires, in the 
> hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to