Nicolas Fraison created HADOOP-14252:
----------------------------------------
Summary: Nodemanagers have DDoS our namenode due to
HDFS_DELEGATION_TOKEN expired or not in the cache
Key: HADOOP-14252
URL: https://issues.apache.org/jira/browse/HADOOP-14252
Project: Hadoop Common
Issue Type: Bug
Components: hdfs-client
Affects Versions: 2.6.0
Environment: Releases:
cloudera release cdh-5.5.0
openjdk version "1.8.0_91"
linux centos6 servers
Cluster info:
Namenode and resourcemanager in HA with kerberos authentication
More than 1300 datanodes/nodemanagers
Reporter: Nicolas Fraison
Priority: Minor
We have faced some huge slowdowns on our namenode due to all our nodemanagers
continuing to retry to renew a lease and reconnecting to the namenode every
second during 1 hour due to some HDFS_DELEGATION_TOKEN being expired or not in
the cache.
The number of time_wait connection on our namenode was stuck to the maximum
configured of 250k during this period due to the reconnections each time.
{code}
2017-03-02 11:51:42,817 INFO
SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
Authorization successful for appattempt_1488396860014_156103_000001
(auth:TOKEN) for protocol=interface
org.apache.hadoop.yarn.api.ContainerManagementProtocolPB
2017-03-02 11:51:43,414 INFO
SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
Authorization successful for appattempt_1488396860014_156120_000001
(auth:TOKEN) for protocol=interface
org.apache.hadoop.yarn.api.ContainerManagementProtocolPB
2017-03-02 11:51:51,994 WARN org.apache.hadoop.security.UserGroupInformation:
PriviledgedActionException as:prediction (auth:SIMPLE)
cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) is expired
2017-03-02 11:51:51,995 WARN org.apache.hadoop.ipc.Client: Exception
encountered while connecting to the server :
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) is expired
2017-03-02 11:51:51,995 WARN org.apache.hadoop.security.UserGroupInformation:
PriviledgedActionException as:prediction (auth:SIMPLE)
cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) is expired
2017-03-02 11:51:51,995 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to
renew lease for [DFSClient_NONMAPREDUCE_1560141256_4187204] for 30 seconds.
Will retry shortly ...
token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) is expired
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1403)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy20.renewLease(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:571)
at sun.reflect.GeneratedMethodAccessor74.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy21.renewLease(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:921)
at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:423)
at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:448)
at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:304)
at java.lang.Thread.run(Thread.java:745)
2017-03-02 12:51:22,032 WARN org.apache.hadoop.security.UserGroupInformation:
PriviledgedActionException as:prediction (auth:SIMPLE)
cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) can't be found in
cache
2017-03-02 12:51:22,032 WARN org.apache.hadoop.ipc.Client: Exception
encountered while connecting to the server :
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) can't be found in
cache
2017-03-02 12:51:22,033 WARN org.apache.hadoop.security.UserGroupInformation:
PriviledgedActionException as:prediction (auth:SIMPLE)
cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) can't be found in
cache
2017-03-02 12:51:22,033 WARN org.apache.hadoop.hdfs.DFSClient: Failed to
renew lease for DFSClient_NONMAPREDUCE_1560141256_4187204 for 3600 seconds (>=
hard-limit =3600 seconds.) Closing all files being written ...
token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) can't be found
in cache
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1403)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy20.renewLease(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:571)
at sun.reflect.GeneratedMethodAccessor74.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy21.renewLease(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:921)
at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:423)
at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:448)
at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:304)
at java.lang.Thread.run(Thread.java:745)
2017-03-02 12:51:27,364 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
rollingMonitorInterval is set as -1. The log rolling mornitoring interval is
disabled. The logs will be aggregated after this application is finished.
{code}
The root cause is the yarn proxy configuration having been removed, which in
turn causes the resource manager to be unable to renew the
HDFS_DELEGATION_TOKEN.
Even though the root cause has been identified, I don't think retrying to renew
a lease every second for an hour when there is an expiry/not found token issue
is normal because this is not an issue that can be recovered.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]