[
https://issues.apache.org/jira/browse/HADOOP-13487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15422187#comment-15422187
]
Xiao Chen commented on HADOOP-13487:
------------------------------------
Thanks Alex for the details!
I confirm this is a bug in {{ZKDelegationTokenSecretManager}}, and the root
cause is that when KMS is restarting, it's not actively loading existing znodes
into its cache. Hence if the token is never accessed (e.g. after a cluster-wise
restart), the znode is not managed by KMS, and eventually leaked.
Obviously we can't have any logic depending on a KMS stop, since when zookeeper
is used we're supposed to have multiple KMS instances.
I can think of several options on fixing this:
# Always load up existing znodes to cache. This would be straightforward but
may harm startup time.
# Have another background thread to periodically check znodes and remove
expired ones.
# Have another process to do #2, so that we don't have to waste resource on
multiple KMS instances to do the same clean up work.
I'm thinking of having a modified #1. Specifically, on KMS restart, fire up a
thread to get the znodes, and then iterate through it to remove the expired
tokens. We can set a random delay on this background task after startup, to
prevent multiple KMS instances racing on the same clean up work.
> Hadoop KMS doesn't clean up old delegation tokens stored in Zookeeper
> ---------------------------------------------------------------------
>
> Key: HADOOP-13487
> URL: https://issues.apache.org/jira/browse/HADOOP-13487
> Project: Hadoop Common
> Issue Type: Bug
> Components: kms
> Affects Versions: 2.6.0
> Reporter: Alex Ivanov
>
> Configuration:
> CDH 5.5.1 (Hadoop 2.6+)
> KMS configured to store delegation tokens in Zookeeper
> DEBUG logging enabled in /etc/hadoop-kms/conf/kms-log4j.properties
> Findings:
> It seems to me delegation tokens never get cleaned up from Zookeeper past
> their renewal date. I can see in the logs that the removal thread is started
> with the expected interval:
> {code}
> 2016-08-11 08:15:24,511 INFO AbstractDelegationTokenSecretManager - Starting
> expired delegation token remover thread, tokenRemoverScanInterval=60 min(s)
> {code}
> However, I don't see any delegation token removals, indicated by the
> following log message:
> org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager
> --> removeStoredToken(TokenIdent ident), line 769 [CDH]
> {code}
> if (LOG.isDebugEnabled()) {
> LOG.debug("Removing ZKDTSMDelegationToken_"
> + ident.getSequenceNumber());
> }
> {code}
> Meanwhile, I see a lot of expired delegation tokens in Zookeeper that don't
> get cleaned up.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]