[
https://issues.apache.org/jira/browse/HADOOP-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14187567#comment-14187567
]
Gregory Chanan commented on HADOOP-11157:
-----------------------------------------
[~kkambatl] while writing up a test as you requested, I found a number of other
issues. This will be kind of scatter-brained, sorry:
1) related to shutdown
- a) the ExpiredToken is shut down after the ZKDelegationTokenSecretManager's
curator, which causes an exception to be thrown and the process to exit. This
can be addressed by shutting down the ExpiredToken thread before the curator.
- b) even with a), the ExpiredTokenThread is interrupted by
AbstractDelegationTokenSecretManager.closeThreads...if the ExpiredTokenThread
is currently rolling the master key or expiring tokens in ZK, the interruption
will cause the process to exit. It seems like this can be addressed by holding
the noInterruptsLock while the ExpiredTokenThread is not sleeping (should be
waiting), but I'm not sure if we want to go that route. Perhaps alternatively
we could deal with the interruption by checking if its expected (i.e. if
running is false). One issue is that approach is that the
ZKDelegationTokenSecretManager functions called from the ExpiredTokenThread
don't throw or keep the interrupt flag, they just catch the exceptions and
possibly throw them as a runtime exception. I'm not sure if we can just
swallow the InterruptedException -- presumably we need the ZK state to be in
some reasonable state in case the process restarts? Of course we have no tests
of that...
2) not related to shutdown
- a) if you run TestZKDelegationTokenSecretManager#testCancelTokenSingleManager
in a loop it will fail eventually. It looks like the issue is how we deal with
asynchronous ZK updates.
Consider the following code:
{code}
token = createToken
cancelToken(token)
verifyToken(token){code}
cancelToken will delete it from the local cache and delete the znode. But the
curator client will get the create child message (in the listener thread) and
add the token back. If that happens after cancelToken, the token will be added
back until the listener thread gets the cancel message again. (It also just
occurred to me that this is happening in two different threads but some of the
structures, like the currentToken, aren't thread safe). The usual way to
prevent this is to assign versions to the znodes so you can track whether you
are getting an update for an old version. I don't know how to deal with it in
this case where deletes are a possibility and there doesn't appear to be a
master that is responsible for writing (i.e. what is preventing some other
SecretManager from recreating the token just after delete -- how would versions
help with that?). This may affect the keyCache as well as the tokenCache.
> ZKDelegationTokenSecretManager never shuts down listenerThreadPool
> ------------------------------------------------------------------
>
> Key: HADOOP-11157
> URL: https://issues.apache.org/jira/browse/HADOOP-11157
> Project: Hadoop Common
> Issue Type: Bug
> Components: security
> Affects Versions: 2.6.0
> Reporter: Gregory Chanan
> Assignee: Gregory Chanan
> Attachments: HADOOP-11157.patch, HADOOP-11157.patch
>
>
> I'm trying to integrate Solr with the DelegationTokenAuthenticationFilter and
> running into this issue. The solr unit tests look for leaked threads and
> when I started using the ZKDelegationTokenSecretManager it started reporting
> leaks. Shuting down the listenerThreadPool after the objects that use it
> resolves the leak threads errors.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)