[ 
https://issues.apache.org/jira/browse/HADOOP-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14187567#comment-14187567
 ] 

Gregory Chanan commented on HADOOP-11157:
-----------------------------------------

[~kkambatl] while writing up a test as you requested, I found a number of other 
issues.  This will be kind of scatter-brained, sorry:

1) related to shutdown
- a) the ExpiredToken is shut down after the ZKDelegationTokenSecretManager's 
curator, which causes an exception to be thrown and the process to exit.  This 
can be addressed by shutting down the ExpiredToken thread before the curator.
- b) even with a), the ExpiredTokenThread is interrupted by 
AbstractDelegationTokenSecretManager.closeThreads...if the ExpiredTokenThread 
is currently rolling the master key or expiring tokens in ZK, the interruption 
will cause the process to exit.  It seems like this can be addressed by holding 
the noInterruptsLock while the ExpiredTokenThread is not sleeping (should be 
waiting), but I'm not sure if we want to go that route.  Perhaps alternatively 
we could deal with the interruption by checking if its expected (i.e. if 
running is false).  One issue is that approach is that the 
ZKDelegationTokenSecretManager functions called from the ExpiredTokenThread 
don't throw or keep the interrupt flag, they just catch the exceptions and 
possibly throw them as a runtime exception.  I'm not sure if we can just 
swallow the InterruptedException -- presumably we need the ZK state to be in 
some reasonable state in case the process restarts?  Of course we have no tests 
of that...
2) not related to shutdown
- a) if you run TestZKDelegationTokenSecretManager#testCancelTokenSingleManager 
in a loop it will fail eventually.  It looks like the issue is how we deal with 
asynchronous ZK updates.
Consider the following code:
{code}
token = createToken
cancelToken(token)
verifyToken(token){code}
cancelToken will delete it from the local cache and delete the znode.  But the 
curator client will get the create child message (in the listener thread) and 
add the token back.  If that happens after cancelToken, the token will be added 
back until the listener thread gets the cancel message again.  (It also just 
occurred to me that this is happening in two different threads but some of the 
structures, like the currentToken, aren't thread safe).  The usual way to 
prevent this is to assign versions to the znodes so you can track whether you 
are getting an update for an old version.  I don't know how to deal with it in 
this case where deletes are a possibility and there doesn't appear to be a 
master that is responsible for writing (i.e. what is preventing some other 
SecretManager from recreating the token just after delete -- how would versions 
help with that?).  This may affect the keyCache as well as the tokenCache.

> ZKDelegationTokenSecretManager never shuts down listenerThreadPool
> ------------------------------------------------------------------
>
>                 Key: HADOOP-11157
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11157
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 2.6.0
>            Reporter: Gregory Chanan
>            Assignee: Gregory Chanan
>         Attachments: HADOOP-11157.patch, HADOOP-11157.patch
>
>
> I'm trying to integrate Solr with the DelegationTokenAuthenticationFilter and 
> running into this issue.  The solr unit tests look for leaked threads and 
> when I started using the ZKDelegationTokenSecretManager it started reporting 
> leaks.  Shuting down the listenerThreadPool after the objects that use it 
> resolves the leak threads errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to