[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813750#comment-17813750
 ] 

Andor Molnar commented on ZOOKEEPER-4236:
-----------------------------------------

[~dbwong]

I'm going to pick up this issue.

I think that cleaning up Login threads properly is a valid issue, but there's a 
higher level problem with creating Login threads for individual clients 
introduced in ZOOKEEPER-2139. The implementation creates a ZooKeeperSaslClient 
(and a new Login thread) every time it tries to connect. That will trigger a 
KDC login. If the ZooKeper server is unavailable for some reason, the client 
will try to connect and login to KDC every second which will bombard the KDC 
with unnecessary login requests.

I'll create a patch to separate the Login thread from ZK server reconnection 
mechanism and make sure the Login thread gets shutdown only when the client 
shuts down.

> Java Client SendThread create many unnecessary Login objects
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-4236
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4236
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: Daniel Wong
>            Priority: Minor
>
> Hi I am an Apache Phoenix committer and I help manage many many zookeeper 
> clusters at my employment primarily using ZK for HBase use cases.  We 
> recently had a production incident where some of our ACLs were not setup 
> preventing connectivity from the client to the ZK nodes and the failure path 
> exposed 2 issues to fix. This Jira and ZooKeeper-4235.  This Jira is the less 
> important of the 2 and handles numerous objects.  We had hundreds of threads 
> per JVM with the following stack trace.  
> {code:java}
> java.lang.Thread.State: RUNNABLE at 
> java.net.PlainSocketImpl.socketConnect([email protected]/Native Method) 
> at 
> java.net.AbstractPlainSocketImpl.doConnect([email protected]/AbstractPlainSocketImpl.java:399)
>  - locked <0x00000015004fde20> (a java.net.SocksSocketImpl) at 
> java.net.AbstractPlainSocketImpl.connectToAddress([email protected]/AbstractPlainSocketImpl.java:242)
>  at 
> java.net.AbstractPlainSocketImpl.connect([email protected]/AbstractPlainSocketImpl.java:224)
>  at 
> java.net.SocksSocketImpl.connect([email protected]/SocksSocketImpl.java:403)
>  at java.net.Socket.connect([email protected]/Socket.java:609) at 
> sun.security.krb5.internal.TCPClient.<init>([email protected]/NetClient.java:62)
>  at 
> sun.security.krb5.internal.NetClient.getInstance([email protected]/NetClient.java:42)
>  at 
> sun.security.krb5.KdcComm$KdcCommunication.run([email protected]/KdcComm.java:401)
>  at 
> sun.security.krb5.KdcComm$KdcCommunication.run([email protected]/KdcComm.java:364)
>  at java.security.AccessController.doPrivileged([email protected]/Native 
> Method) at 
> sun.security.krb5.KdcComm.send([email protected]/KdcComm.java:348)
>  at 
> sun.security.krb5.KdcComm.sendIfPossible([email protected]/KdcComm.java:253)
>  at 
> sun.security.krb5.KdcComm.send([email protected]/KdcComm.java:234)
>  at 
> sun.security.krb5.KdcComm.send([email protected]/KdcComm.java:200)
>  at 
> sun.security.krb5.KrbAsReqBuilder.send([email protected]/KrbAsReqBuilder.java:326)
>  at 
> sun.security.krb5.KrbAsReqBuilder.action([email protected]/KrbAsReqBuilder.java:371)
>  at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication([email protected]/Krb5LoginModule.java:754)
>  at 
> com.sun.security.auth.module.Krb5LoginModule.login([email protected]/Krb5LoginModule.java:592)
>  at 
> javax.security.auth.login.LoginContext.invoke([email protected]/LoginContext.java:726)
>  at 
> javax.security.auth.login.LoginContext$4.run([email protected]/LoginContext.java:665)
>  at 
> javax.security.auth.login.LoginContext$4.run([email protected]/LoginContext.java:663)
>  at java.security.AccessController.doPrivileged([email protected]/Native 
> Method) at 
> javax.security.auth.login.LoginContext.invokePriv([email protected]/LoginContext.java:663)
>  at 
> javax.security.auth.login.LoginContext.login([email protected]/LoginContext.java:574)
>  at org.apache.zookeeper.Login.login(Login.java:304) - locked 
> <0x000000151c477148> (a org.apache.zookeeper.Login) at 
> org.apache.zookeeper.Login.<init>(Login.java:106) at 
> org.apache.zookeeper.client.ZooKeeperSaslClient.createSaslClient(ZooKeeperSaslClient.java:249)
>  - locked <0x000000151c476f68> (a 
> org.apache.zookeeper.client.ZooKeeperSaslClient) at 
> org.apache.zookeeper.client.ZooKeeperSaslClient.<init>(ZooKeeperSaslClient.java:141)
>  at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:972) 
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1031)
> {code}
> Note that these were logging in to our 10 ZK nodes but we had 100s of Logins. 
>  In theory we  should only need at most 10 Logins.  
> This Jira is intended to improve the behavior in limiting the number of Login 
> objects/clients to the needed number.  Note that a combination of JIRAs 
> https://issues.apache.org/jira/browse/ZOOKEEPER-2375 and 
> https://issues.apache.org/jira/browse/ZOOKEEPER-2139  removed the singleton 
> at the Login level but left in unnecessary synchronization code.  This could 
> be again improved via either a singleton perhaps at the SaslClient layer or 
> some sort of connection -> login cache so that new connections would 
> reuse/wait for the same objects in failure paths.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to