[ 
https://issues.apache.org/jira/browse/HIVE-21993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-21993:
------------------------------------
    Description: 
DelegationStore can be configured to run in-mem/DB/ZK based TokenStores. 
{{TokenStoreDelegationTokenSecretManager}} purges the tokens (>24 hours) 
periodically every 1 hour by default. 

+Issue:+
When large number of delegation tokens are present in ZK, 
{{TokenStoreDelegationTokenSecretManager::removeExpiredTokens}} can throw the 
following exception when connecting to ZK.

{noformat}
WARN [main-SendThread(xyz:2181)]: org.apache.zookeeper.ClientCnxn: Session 
0x36a161083865cd9 for server xyz/1.2.3.4:2181, unexpected error, closing socket 
connection and attempting reconnect
java.io.IOException: Packet len68985070 is out of range!
        at 
org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112) 
~[zookeeper-3.4.6.jar]
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:79) 
~[zookeeper-3.4.6.jar]
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
 ~[zookeeper-3.4.6.jar]
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 
[zookeeper-3.4.6.jar]

...
...

 INFO [main-EventThread]: 
org.apache.curator.framework.state.ConnectionStateManager: State change: 
SUSPENDED
 ERROR [Thread[Thread-13,5,main]]: 
org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager: 
ExpiredTokenRemover thread received unexpected exception. 
org.apache.hadoop.hive.thrift.DelegationTokenStore$TokenStoreException: Error 
getting children for /hivedelegationMETASTORE/tokens
org.apache.hadoop.hive.thrift.DelegationTokenStore$TokenStoreException: Error 
getting children for /hivedelegationMETASTORE/tokens
        at 
org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.zkGetChildren(ZooKeeperTokenStore.java:280)
 ~[hive-exec-x.y.z.jar]
        at 
org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.getAllDelegationTokenIdentifiers(ZooKeeperTokenStore.java:413)
 ~[hive-exec-x.y.z.jar]
        at 
org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.removeExpiredTokens(TokenStoreDelegationTokenSecretManager.java:238)
 ~[hive-exec-x.y.z.jar]
        at 
org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager$ExpiredTokenRemover.run(TokenStoreDelegationTokenSecretManager.java:309)
 [hive-exec-x.y.z.jar]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for /hivedelegationMETASTORE/tokens
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) 
~[zookeeper-3.4.6.jar]
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) 
~[zookeeper-3.4.6.jar]
        at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590) 
~[zookeeper-3.4.6.jar]
        at 
org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:214)
 ~[curator-framework-2.7.1.jar:?]
        at 
org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:203)
 ~[curator-framework-2.7.1.jar:?]
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) 
~[curator-client-2.7.1.jar:?]
        at 
org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:200)
 ~[curator-framework-2.7.1.jar:?]
        at 
org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:191)
 ~[curator-framework-2.7.1.jar:?]
        at 
org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:38)
 ~[curator-framework-2.7.1.jar:?]
        at 
org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.zkGetChildren(ZooKeeperTokenStore.java:278)
 ~[hive-exec-x.y.z.jar]
        ... 4 more        
{noformat}

When packet length is greater than {{jute.maxBuffer}}, it ends up throwing this 
exception and it reconnects the connection.

However, the same ZK client is being used for {{addToken, removeToken}} calls 
which are run in different threads. This creates problems when 
creating/deleting tokens.

1. Issue in creating tokens:

New token is added at the same time, when ZK client is in suspended state (due 
to above mentioned reason). Node is already created by curator, but before it 
could verify connection gets into stale state. So curator framework retries and 
ends up with the following exception. So creating tokens fails often.

{noformat}

Caused by: 
org.apache.hadoop.hive.thrift.DelegationTokenStore$TokenStoreException: Error 
creating new node with path 
/hivedelegationMETASTORE/tokens/AAVwaXBlcgVwaXBlckdoaXZlL2hhZG9vcHNlcnZpY2UyNy1waHgyLnByb2QudWJlci5pbnRlcm5hbEBEQVRBU1JFLlBST0QuVUJFUi5JTlRFUk5BTIoBa9UkJgmKAWv5MKoJjQEo9Y4IUQ
        at 
org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.addToken(ZooKeeperTokenStore.java:382)
        at 
org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.createPassword(TokenStoreDelegationTokenSecretManager.java:142)
        at 
org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.createPassword(TokenStoreDelegationTokenSecretManager.java:56)
        at org.apache.hadoop.security.token.Token.<init>(Token.java:62)
        at 
org.apache.hadoop.hive.thrift.DelegationTokenSecretManager.getDelegationToken(DelegationTokenSecretManager.java:109)
        at 
org.apache.hadoop.hive.thrift.HiveDelegationTokenManager$1.run(HiveDelegationTokenManager.java:131)
        at 
org.apache.hadoop.hive.thrift.HiveDelegationTokenManager$1.run(HiveDelegationTokenManager.java:127)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at 
org.apache.hadoop.hive.thrift.HiveDelegationTokenManager.getDelegationToken(HiveDelegationTokenManager.java:127)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore.getDelegationToken(HiveMetaStore.java:7060)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_delegation_token(HiveMetaStore.java:5970)
        ... 20 more
Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: 
KeeperErrorCode = NodeExists for 
/hivedelegationMETASTORE/tokens/AAVwaXBlcgVwaXBlckdoaXZlL2hhZG9vcHNlcnZpY2UyNy1waHgyLnByb2QudWJlci5pbnRlcm5hbEBEQVRBU1JFLlBST0QuVUJFUi5JTlRFUk5BTIoBa9UkJgmKAWv5MKoJjQEo9Y4IUQ
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
        at 
org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:691)
        at 
org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:675)
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
        at 
org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:672)
        at 
org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:453)
        at 
org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:443)
        at 
org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:251)
        at 
org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:205)
        at 
org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.addToken(ZooKeeperTokenStore.java:380)
        ... 32 more
{noformat}



2. Issue in deleting tokens:

Deleting tokens ends up with following exception when expiry thread breaks the 
ZK client. So it does not clear up the old tokens and keeps building more 
pressure on packet length.

{noformat}


Caused by: 
org.apache.hadoop.hive.thrift.DelegationTokenStore$TokenStoreException: Error 
deleting 
/hivedelegationMETASTORE/tokens/AAVwaXBlcgVwaXBlckdoaXZlL2hhZG9vcHNlcnZpY2UyNS1waHgyLnByb2QudWJlci5pbnRlcm5hbEBEQVRBU1JFLlBST0QuVUJFUi5JTlRFUk5BTIoBa9ex1FWKAWv7vlhVjQGHdI4IUA
        at 
org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.zkDelete(ZooKeeperTokenStore.java:340)
        at 
org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.removeToken(ZooKeeperTokenStore.java:392)
        at 
org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.cancelToken(TokenStoreDelegationTokenSecretManager.java:122)
        at 
org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.cancelToken(TokenStoreDelegationTokenSecretManager.java:56)
        at 
org.apache.hadoop.hive.thrift.DelegationTokenSecretManager.cancelDelegationToken(DelegationTokenSecretManager.java:90)
        at 
org.apache.hadoop.hive.thrift.HiveDelegationTokenManager.cancelDelegationToken(HiveDelegationTokenManager.java:152)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore.cancelDelegationToken(HiveMetaStore.java:7049)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.cancel_delegation_token(HiveMetaStore.java:5916)
        ... 20 more
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for 
/hivedelegationMETASTORE/tokens/AAVwaXBlcgVwaXBlckdoaXZlL2hhZG9vcHNlcnZpY2UyNS1waHgyLnByb2QudWJlci5pbnRlcm5hbEBEQVRBU1JFLlBST0QuVUJFUi5JTlRFUk5BTIoBa9ex1FWKAWv7vlhVjQGHdI4IUA
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
        at 
org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:238)
        at 
org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:233)
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
        at 
org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:230)
        at 
org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:214)
        at 
org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:41)
        at 
org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.zkDelete(ZooKeeperTokenStore.java:336)
        ... 27 more

{noformat}



And this process repeats every 5 seconds due to expiry thread setting

https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/security/TokenStoreDelegationTokenSecretManager.java#L312

{{lastTokenCacheCleanup}} never gets reset as well; so expiring tokens starts 
happening every 5 seconds instead of every hour 
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/security/TokenStoreDelegationTokenSecretManager.java#L309


Overall, system starts to run in degraded mode as this happens every 5 seconds. 



1. https://issues.apache.org/jira/browse/ZOOKEEPER-2933 tries to provide a way 
to monitor "jute.maxBuffer". Not sure if curator frameworks make use of this 
recently.
2. Another option is to update {{lastTokenCacheCleanup}} just before invoking 
{{removeExpiredTokens}}. This way, it still honours the expiry removal 
interval. In the exception, it would be good to add details about 
{{jute.maxBuffer}}, so that users can configure it to a higher value during 
next restart. 
3. Another option is to set {{jute.maxBuffer}} dynamically. 200 MB max limit 
would allow to clear up ~1.6 million entries. Additional message in the 
exception handling can give more info on this.



  was:
DelegationStore can be configured to run in-mem/DB/ZK based TokenStores. 
{{TokenStoreDelegationTokenSecretManager}} purges the tokens (>24 hours) 
periodically every 1 hour by default. 

+Issue:+
When large number of delegation tokens are present in ZK, 
{{TokenStoreDelegationTokenSecretManager::removeExpiredTokens}} can throw the 
following exception when connecting to ZK.

{{noformat}}
WARN [main-SendThread(xyz:2181)]: org.apache.zookeeper.ClientCnxn: Session 
0x36a161083865cd9 for server xyz/1.2.3.4:2181, unexpected error, closing socket 
connection and attempting reconnect
java.io.IOException: Packet len68985070 is out of range!
        at 
org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112) 
~[zookeeper-3.4.6.jar]
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:79) 
~[zookeeper-3.4.6.jar]
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
 ~[zookeeper-3.4.6.jar]
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 
[zookeeper-3.4.6.jar]

...
...

 INFO [main-EventThread]: 
org.apache.curator.framework.state.ConnectionStateManager: State change: 
SUSPENDED
 ERROR [Thread[Thread-13,5,main]]: 
org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager: 
ExpiredTokenRemover thread received unexpected exception. 
org.apache.hadoop.hive.thrift.DelegationTokenStore$TokenStoreException: Error 
getting children for /hivedelegationMETASTORE/tokens
org.apache.hadoop.hive.thrift.DelegationTokenStore$TokenStoreException: Error 
getting children for /hivedelegationMETASTORE/tokens
        at 
org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.zkGetChildren(ZooKeeperTokenStore.java:280)
 ~[hive-exec-x.y.z.jar]
        at 
org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.getAllDelegationTokenIdentifiers(ZooKeeperTokenStore.java:413)
 ~[hive-exec-x.y.z.jar]
        at 
org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.removeExpiredTokens(TokenStoreDelegationTokenSecretManager.java:238)
 ~[hive-exec-x.y.z.jar]
        at 
org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager$ExpiredTokenRemover.run(TokenStoreDelegationTokenSecretManager.java:309)
 [hive-exec-x.y.z.jar]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for /hivedelegationMETASTORE/tokens
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) 
~[zookeeper-3.4.6.jar]
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) 
~[zookeeper-3.4.6.jar]
        at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590) 
~[zookeeper-3.4.6.jar]
        at 
org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:214)
 ~[curator-framework-2.7.1.jar:?]
        at 
org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:203)
 ~[curator-framework-2.7.1.jar:?]
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) 
~[curator-client-2.7.1.jar:?]
        at 
org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:200)
 ~[curator-framework-2.7.1.jar:?]
        at 
org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:191)
 ~[curator-framework-2.7.1.jar:?]
        at 
org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:38)
 ~[curator-framework-2.7.1.jar:?]
        at 
org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.zkGetChildren(ZooKeeperTokenStore.java:278)
 ~[hive-exec-x.y.z.jar]
        ... 4 more        
{{noformat}}

When packet length is greater than {{jute.maxBuffer}}, it ends up throwing this 
exception and it reconnects the connection.

However, the same ZK client is being used for {{addToken, removeToken}} calls 
which are run in different threads. This creates problems when 
creating/deleting tokens.

1. Issue in creating tokens:

New token is added at the same time, when ZK client is in suspended state (due 
to above mentioned reason). Node is already created by curator, but before it 
could verify connection gets into stale state. So curator framework retries and 
ends up with the following exception. So creating tokens fails often.

{{noformat}}

Caused by: 
org.apache.hadoop.hive.thrift.DelegationTokenStore$TokenStoreException: Error 
creating new node with path 
/hivedelegationMETASTORE/tokens/AAVwaXBlcgVwaXBlckdoaXZlL2hhZG9vcHNlcnZpY2UyNy1waHgyLnByb2QudWJlci5pbnRlcm5hbEBEQVRBU1JFLlBST0QuVUJFUi5JTlRFUk5BTIoBa9UkJgmKAWv5MKoJjQEo9Y4IUQ
        at 
org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.addToken(ZooKeeperTokenStore.java:382)
        at 
org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.createPassword(TokenStoreDelegationTokenSecretManager.java:142)
        at 
org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.createPassword(TokenStoreDelegationTokenSecretManager.java:56)
        at org.apache.hadoop.security.token.Token.<init>(Token.java:62)
        at 
org.apache.hadoop.hive.thrift.DelegationTokenSecretManager.getDelegationToken(DelegationTokenSecretManager.java:109)
        at 
org.apache.hadoop.hive.thrift.HiveDelegationTokenManager$1.run(HiveDelegationTokenManager.java:131)
        at 
org.apache.hadoop.hive.thrift.HiveDelegationTokenManager$1.run(HiveDelegationTokenManager.java:127)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at 
org.apache.hadoop.hive.thrift.HiveDelegationTokenManager.getDelegationToken(HiveDelegationTokenManager.java:127)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore.getDelegationToken(HiveMetaStore.java:7060)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_delegation_token(HiveMetaStore.java:5970)
        ... 20 more
Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: 
KeeperErrorCode = NodeExists for 
/hivedelegationMETASTORE/tokens/AAVwaXBlcgVwaXBlckdoaXZlL2hhZG9vcHNlcnZpY2UyNy1waHgyLnByb2QudWJlci5pbnRlcm5hbEBEQVRBU1JFLlBST0QuVUJFUi5JTlRFUk5BTIoBa9UkJgmKAWv5MKoJjQEo9Y4IUQ
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
        at 
org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:691)
        at 
org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:675)
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
        at 
org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:672)
        at 
org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:453)
        at 
org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:443)
        at 
org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:251)
        at 
org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:205)
        at 
org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.addToken(ZooKeeperTokenStore.java:380)
        ... 32 more
{{noformat}}



2. Issue in deleting tokens:

Deleting tokens ends up with following exception when expiry thread breaks the 
ZK client. So it does not clear up the old tokens and keeps building more 
pressure on packet length.

{{noformat}}


Caused by: 
org.apache.hadoop.hive.thrift.DelegationTokenStore$TokenStoreException: Error 
deleting 
/hivedelegationMETASTORE/tokens/AAVwaXBlcgVwaXBlckdoaXZlL2hhZG9vcHNlcnZpY2UyNS1waHgyLnByb2QudWJlci5pbnRlcm5hbEBEQVRBU1JFLlBST0QuVUJFUi5JTlRFUk5BTIoBa9ex1FWKAWv7vlhVjQGHdI4IUA
        at 
org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.zkDelete(ZooKeeperTokenStore.java:340)
        at 
org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.removeToken(ZooKeeperTokenStore.java:392)
        at 
org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.cancelToken(TokenStoreDelegationTokenSecretManager.java:122)
        at 
org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.cancelToken(TokenStoreDelegationTokenSecretManager.java:56)
        at 
org.apache.hadoop.hive.thrift.DelegationTokenSecretManager.cancelDelegationToken(DelegationTokenSecretManager.java:90)
        at 
org.apache.hadoop.hive.thrift.HiveDelegationTokenManager.cancelDelegationToken(HiveDelegationTokenManager.java:152)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore.cancelDelegationToken(HiveMetaStore.java:7049)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.cancel_delegation_token(HiveMetaStore.java:5916)
        ... 20 more
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for 
/hivedelegationMETASTORE/tokens/AAVwaXBlcgVwaXBlckdoaXZlL2hhZG9vcHNlcnZpY2UyNS1waHgyLnByb2QudWJlci5pbnRlcm5hbEBEQVRBU1JFLlBST0QuVUJFUi5JTlRFUk5BTIoBa9ex1FWKAWv7vlhVjQGHdI4IUA
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
        at 
org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:238)
        at 
org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:233)
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
        at 
org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:230)
        at 
org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:214)
        at 
org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:41)
        at 
org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.zkDelete(ZooKeeperTokenStore.java:336)
        ... 27 more

{{noformat}}



And this process repeats every 5 seconds due to expiry thread setting

https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/security/TokenStoreDelegationTokenSecretManager.java#L312

{{lastTokenCacheCleanup}} never gets reset as well; so expiring tokens starts 
happening every 5 seconds instead of every hour 
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/security/TokenStoreDelegationTokenSecretManager.java#L309


Overall, system starts to run in degraded mode as this happens every 5 seconds. 



1. https://issues.apache.org/jira/browse/ZOOKEEPER-2933 tries to provide a way 
to monitor "jute.maxBuffer". Not sure if curator frameworks make use of this 
recently.
2. Another option is to update {{lastTokenCacheCleanup}} just before invoking 
{{removeExpiredTokens}}. This way, it still honours the expiry removal 
interval. In the exception, it would be good to add details about 
{{jute.maxBuffer}}, so that users can configure it to a higher value during 
next restart. 
3. Another option is to set {{jute.maxBuffer}} dynamically. 200 MB max limit 
would allow to clear up ~1.6 million entries. Additional message in the 
exception handling can give more info on this.




> HS/HMS delegationstore with ZK can degrade performance when jute.maxBuffer is 
> reached
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-21993
>                 URL: https://issues.apache.org/jira/browse/HIVE-21993
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 3.0.0, 4.0.0, 2.3.4
>            Reporter: Rajesh Balamohan
>            Priority: Major
>
> DelegationStore can be configured to run in-mem/DB/ZK based TokenStores. 
> {{TokenStoreDelegationTokenSecretManager}} purges the tokens (>24 hours) 
> periodically every 1 hour by default. 
> +Issue:+
> When large number of delegation tokens are present in ZK, 
> {{TokenStoreDelegationTokenSecretManager::removeExpiredTokens}} can throw the 
> following exception when connecting to ZK.
> {noformat}
> WARN [main-SendThread(xyz:2181)]: org.apache.zookeeper.ClientCnxn: Session 
> 0x36a161083865cd9 for server xyz/1.2.3.4:2181, unexpected error, closing 
> socket connection and attempting reconnect
> java.io.IOException: Packet len68985070 is out of range!
>         at 
> org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112) 
> ~[zookeeper-3.4.6.jar]
>         at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:79) 
> ~[zookeeper-3.4.6.jar]
>         at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
>  ~[zookeeper-3.4.6.jar]
>         at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 
> [zookeeper-3.4.6.jar]
> ...
> ...
>  INFO [main-EventThread]: 
> org.apache.curator.framework.state.ConnectionStateManager: State change: 
> SUSPENDED
>  ERROR [Thread[Thread-13,5,main]]: 
> org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager: 
> ExpiredTokenRemover thread received unexpected exception. 
> org.apache.hadoop.hive.thrift.DelegationTokenStore$TokenStoreException: Error 
> getting children for /hivedelegationMETASTORE/tokens
> org.apache.hadoop.hive.thrift.DelegationTokenStore$TokenStoreException: Error 
> getting children for /hivedelegationMETASTORE/tokens
>         at 
> org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.zkGetChildren(ZooKeeperTokenStore.java:280)
>  ~[hive-exec-x.y.z.jar]
>         at 
> org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.getAllDelegationTokenIdentifiers(ZooKeeperTokenStore.java:413)
>  ~[hive-exec-x.y.z.jar]
>         at 
> org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.removeExpiredTokens(TokenStoreDelegationTokenSecretManager.java:238)
>  ~[hive-exec-x.y.z.jar]
>         at 
> org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager$ExpiredTokenRemover.run(TokenStoreDelegationTokenSecretManager.java:309)
>  [hive-exec-x.y.z.jar]
>         at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for /hivedelegationMETASTORE/tokens
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99) 
> ~[zookeeper-3.4.6.jar]
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51) 
> ~[zookeeper-3.4.6.jar]
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590) 
> ~[zookeeper-3.4.6.jar]
>         at 
> org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:214)
>  ~[curator-framework-2.7.1.jar:?]
>         at 
> org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:203)
>  ~[curator-framework-2.7.1.jar:?]
>         at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) 
> ~[curator-client-2.7.1.jar:?]
>         at 
> org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:200)
>  ~[curator-framework-2.7.1.jar:?]
>         at 
> org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:191)
>  ~[curator-framework-2.7.1.jar:?]
>         at 
> org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:38)
>  ~[curator-framework-2.7.1.jar:?]
>         at 
> org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.zkGetChildren(ZooKeeperTokenStore.java:278)
>  ~[hive-exec-x.y.z.jar]
>         ... 4 more        
> {noformat}
> When packet length is greater than {{jute.maxBuffer}}, it ends up throwing 
> this exception and it reconnects the connection.
> However, the same ZK client is being used for {{addToken, removeToken}} calls 
> which are run in different threads. This creates problems when 
> creating/deleting tokens.
> 1. Issue in creating tokens:
> New token is added at the same time, when ZK client is in suspended state 
> (due to above mentioned reason). Node is already created by curator, but 
> before it could verify connection gets into stale state. So curator framework 
> retries and ends up with the following exception. So creating tokens fails 
> often.
> {noformat}
> Caused by: 
> org.apache.hadoop.hive.thrift.DelegationTokenStore$TokenStoreException: Error 
> creating new node with path 
> /hivedelegationMETASTORE/tokens/AAVwaXBlcgVwaXBlckdoaXZlL2hhZG9vcHNlcnZpY2UyNy1waHgyLnByb2QudWJlci5pbnRlcm5hbEBEQVRBU1JFLlBST0QuVUJFUi5JTlRFUk5BTIoBa9UkJgmKAWv5MKoJjQEo9Y4IUQ
>         at 
> org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.addToken(ZooKeeperTokenStore.java:382)
>         at 
> org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.createPassword(TokenStoreDelegationTokenSecretManager.java:142)
>         at 
> org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.createPassword(TokenStoreDelegationTokenSecretManager.java:56)
>         at org.apache.hadoop.security.token.Token.<init>(Token.java:62)
>         at 
> org.apache.hadoop.hive.thrift.DelegationTokenSecretManager.getDelegationToken(DelegationTokenSecretManager.java:109)
>         at 
> org.apache.hadoop.hive.thrift.HiveDelegationTokenManager$1.run(HiveDelegationTokenManager.java:131)
>         at 
> org.apache.hadoop.hive.thrift.HiveDelegationTokenManager$1.run(HiveDelegationTokenManager.java:127)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
>         at 
> org.apache.hadoop.hive.thrift.HiveDelegationTokenManager.getDelegationToken(HiveDelegationTokenManager.java:127)
>         at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.getDelegationToken(HiveMetaStore.java:7060)
>         at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_delegation_token(HiveMetaStore.java:5970)
>         ... 20 more
> Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: 
> KeeperErrorCode = NodeExists for 
> /hivedelegationMETASTORE/tokens/AAVwaXBlcgVwaXBlckdoaXZlL2hhZG9vcHNlcnZpY2UyNy1waHgyLnByb2QudWJlci5pbnRlcm5hbEBEQVRBU1JFLlBST0QuVUJFUi5JTlRFUk5BTIoBa9UkJgmKAWv5MKoJjQEo9Y4IUQ
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>         at 
> org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:691)
>         at 
> org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:675)
>         at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>         at 
> org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:672)
>         at 
> org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:453)
>         at 
> org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:443)
>         at 
> org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:251)
>         at 
> org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:205)
>         at 
> org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.addToken(ZooKeeperTokenStore.java:380)
>         ... 32 more
> {noformat}
> 2. Issue in deleting tokens:
> Deleting tokens ends up with following exception when expiry thread breaks 
> the ZK client. So it does not clear up the old tokens and keeps building more 
> pressure on packet length.
> {noformat}
> Caused by: 
> org.apache.hadoop.hive.thrift.DelegationTokenStore$TokenStoreException: Error 
> deleting 
> /hivedelegationMETASTORE/tokens/AAVwaXBlcgVwaXBlckdoaXZlL2hhZG9vcHNlcnZpY2UyNS1waHgyLnByb2QudWJlci5pbnRlcm5hbEBEQVRBU1JFLlBST0QuVUJFUi5JTlRFUk5BTIoBa9ex1FWKAWv7vlhVjQGHdI4IUA
>         at 
> org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.zkDelete(ZooKeeperTokenStore.java:340)
>         at 
> org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.removeToken(ZooKeeperTokenStore.java:392)
>         at 
> org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.cancelToken(TokenStoreDelegationTokenSecretManager.java:122)
>         at 
> org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.cancelToken(TokenStoreDelegationTokenSecretManager.java:56)
>         at 
> org.apache.hadoop.hive.thrift.DelegationTokenSecretManager.cancelDelegationToken(DelegationTokenSecretManager.java:90)
>         at 
> org.apache.hadoop.hive.thrift.HiveDelegationTokenManager.cancelDelegationToken(HiveDelegationTokenManager.java:152)
>         at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.cancelDelegationToken(HiveMetaStore.java:7049)
>         at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.cancel_delegation_token(HiveMetaStore.java:5916)
>         ... 20 more
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for 
> /hivedelegationMETASTORE/tokens/AAVwaXBlcgVwaXBlckdoaXZlL2hhZG9vcHNlcnZpY2UyNS1waHgyLnByb2QudWJlci5pbnRlcm5hbEBEQVRBU1JFLlBST0QuVUJFUi5JTlRFUk5BTIoBa9ex1FWKAWv7vlhVjQGHdI4IUA
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
>         at 
> org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:238)
>         at 
> org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:233)
>         at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>         at 
> org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:230)
>         at 
> org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:214)
>         at 
> org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:41)
>         at 
> org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.zkDelete(ZooKeeperTokenStore.java:336)
>         ... 27 more
> {noformat}
> And this process repeats every 5 seconds due to expiry thread setting
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/security/TokenStoreDelegationTokenSecretManager.java#L312
> {{lastTokenCacheCleanup}} never gets reset as well; so expiring tokens starts 
> happening every 5 seconds instead of every hour 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/security/TokenStoreDelegationTokenSecretManager.java#L309
> Overall, system starts to run in degraded mode as this happens every 5 
> seconds. 
> 1. https://issues.apache.org/jira/browse/ZOOKEEPER-2933 tries to provide a 
> way to monitor "jute.maxBuffer". Not sure if curator frameworks make use of 
> this recently.
> 2. Another option is to update {{lastTokenCacheCleanup}} just before invoking 
> {{removeExpiredTokens}}. This way, it still honours the expiry removal 
> interval. In the exception, it would be good to add details about 
> {{jute.maxBuffer}}, so that users can configure it to a higher value during 
> next restart. 
> 3. Another option is to set {{jute.maxBuffer}} dynamically. 200 MB max limit 
> would allow to clear up ~1.6 million entries. Additional message in the 
> exception handling can give more info on this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to