[ https://issues.apache.org/jira/browse/HADOOP-15487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489733#comment-16489733 ]
Wei-Chiu Chuang commented on HADOOP-15487: ------------------------------------------ FYI here comes another that looks eerily similar: This one is from a NameNode on a different cluster, CDH5.13.2, jdk1.8.0_74. {noformat} 2018-05-20 14:01:35,314 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: readAndProcess from client 192.168.30.37 threw exception [java.lang.IllegalStateException: This ticket is no longer valid] java.lang.IllegalStateException: This ticket is no longer valid at javax.security.auth.kerberos.KerberosTicket.toString(KerberosTicket.java:638) at java.lang.String.valueOf(String.java:2994) at java.lang.StringBuilder.append(StringBuilder.java:131) at sun.security.jgss.krb5.SubjectComber.findAux(SubjectComber.java:171) at sun.security.jgss.krb5.SubjectComber.find(SubjectComber.java:61) at sun.security.jgss.krb5.ServiceCreds.getInstance(ServiceCreds.java:127) at sun.security.jgss.krb5.Krb5Util.getServiceCreds(Krb5Util.java:203) at sun.security.jgss.krb5.Krb5AcceptCredential$1.run(Krb5AcceptCredential.java:74) at sun.security.jgss.krb5.Krb5AcceptCredential$1.run(Krb5AcceptCredential.java:72) at java.security.AccessController.doPrivileged(Native Method) at sun.security.jgss.krb5.Krb5AcceptCredential.getInstance(Krb5AcceptCredential.java:71) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:127) at sun.security.jgss.GSSManagerImpl.getCredentialElement(GSSManagerImpl.java:193) at sun.security.jgss.GSSCredentialImpl.add(GSSCredentialImpl.java:427) at sun.security.jgss.GSSCredentialImpl.<init>(GSSCredentialImpl.java:62) at sun.security.jgss.GSSManagerImpl.createCredential(GSSManagerImpl.java:154) at com.sun.security.sasl.gsskerb.GssKrb5Server.<init>(GssKrb5Server.java:108) at com.sun.security.sasl.gsskerb.FactoryImpl.createSaslServer(FactoryImpl.java:85) at org.apache.hadoop.security.SaslRpcServer$FastSaslServerFactory.createSaslServer(SaslRpcServer.java:398) at org.apache.hadoop.security.SaslRpcServer$1.run(SaslRpcServer.java:164) at org.apache.hadoop.security.SaslRpcServer$1.run(SaslRpcServer.java:161) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.security.SaslRpcServer.create(SaslRpcServer.java:160) at org.apache.hadoop.ipc.Server$Connection.createSaslServer(Server.java:1742) at org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1522) at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1433) at org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1396) at org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:2080) at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1920) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1682) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:896) at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:752) at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:723) 2018-05-20 14:01:35,385 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for u...@example.com (auth:KERBEROS) 2018-05-20 14:01:35,411 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for u...@example.com (auth:KERBEROS) for protocol=interface org .apache.hadoop.hdfs.protocol.ClientProtocol 2018-05-20 14:01:35,545 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs/nn1.example....@example.com (auth:KERBEROS) cause:javax.security.sasl.SaslExcept ion: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2018-05-20 14:01:35,545 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs/nn1.example....@example.com (auth:KERBEROS) cause:javax.security.sasl.SaslExcept ion: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2018-05-20 14:01:35,545 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs/nn1.example....@example.com (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2018-05-20 14:01:35,561 WARN org.apache.hadoop.security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 60 seconds before. Last Login=1526850095545 2018-05-20 14:01:35,562 WARN org.apache.hadoop.security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 60 seconds before. Last Login=1526850095545 {noformat} Maybe UGI.reloginFromKeytab() and SaslRpcServer$FastSaslServerFactory.createSaslServer() have race conditions? > ConcurrentModificationException resulting in Kerberos authentication error. > --------------------------------------------------------------------------- > > Key: HADOOP-15487 > URL: https://issues.apache.org/jira/browse/HADOOP-15487 > Project: Hadoop Common > Issue Type: Bug > Environment: CDH 5.13.3. Kerberized, Hadoop-HA, jdk1.8.0_152 > Reporter: Wei-Chiu Chuang > Priority: Major > > We found the following exception message in a NameNode log. It seems the > ConcurrentModificationException caused Kerberos authentication error. > It appears to be a JDK bug, similar to HADOOP-13433 (Race in > UGI.reloginFromKeytab) but the version of Hadoop (CDH5.13.3) already patched > HADOOP-13433. (The stacktrace also differs) This cluster runs on JDK > 1.8.0_152. > {noformat} > 2018-05-19 04:00:00,182 WARN org.apache.hadoop.security.UserGroupInformation: > PriviledgedActionException as:hdfs/no...@example.com (auth:KERBEROS) > cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > 2018-05-19 04:00:00,183 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 > for port 8020: readAndProcess from client 10.16.20.122 threw exception > [java.util.ConcurrentModificationException] > java.util.ConcurrentModificationException > at > java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:966) > at java.util.LinkedList$ListItr.next(LinkedList.java:888) > at javax.security.auth.Subject$SecureSet$1.next(Subject.java:1070) > at javax.security.auth.Subject$ClassSet$1.run(Subject.java:1401) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject$ClassSet.populateSet(Subject.java:1399) > at javax.security.auth.Subject$ClassSet.<init>(Subject.java:1372) > at javax.security.auth.Subject.getPrivateCredentials(Subject.java:767) > at > sun.security.jgss.krb5.SubjectComber.findAux(SubjectComber.java:127) > at > sun.security.jgss.krb5.SubjectComber.findMany(SubjectComber.java:69) > at > sun.security.jgss.krb5.ServiceCreds.getInstance(ServiceCreds.java:96) > at sun.security.jgss.krb5.Krb5Util.getServiceCreds(Krb5Util.java:203) > at > sun.security.jgss.krb5.Krb5AcceptCredential$1.run(Krb5AcceptCredential.java:74) > at > sun.security.jgss.krb5.Krb5AcceptCredential$1.run(Krb5AcceptCredential.java:72) > at java.security.AccessController.doPrivileged(Native Method) > at > sun.security.jgss.krb5.Krb5AcceptCredential.getInstance(Krb5AcceptCredential.java:71) > at > sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:127) > at > sun.security.jgss.GSSManagerImpl.getCredentialElement(GSSManagerImpl.java:193) > at sun.security.jgss.GSSCredentialImpl.add(GSSCredentialImpl.java:427) > at > sun.security.jgss.GSSCredentialImpl.<init>(GSSCredentialImpl.java:62) > at > sun.security.jgss.GSSManagerImpl.createCredential(GSSManagerImpl.java:154) > at > com.sun.security.sasl.gsskerb.GssKrb5Server.<init>(GssKrb5Server.java:108) > at > com.sun.security.sasl.gsskerb.FactoryImpl.createSaslServer(FactoryImpl.java:85) > at > org.apache.hadoop.security.SaslRpcServer$FastSaslServerFactory.createSaslServer(SaslRpcServer.java:398) > at > org.apache.hadoop.security.SaslRpcServer$1.run(SaslRpcServer.java:164) > at > org.apache.hadoop.security.SaslRpcServer$1.run(SaslRpcServer.java:161) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) > at > org.apache.hadoop.security.SaslRpcServer.create(SaslRpcServer.java:160) > at > org.apache.hadoop.ipc.Server$Connection.createSaslServer(Server.java:1742) > at > org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1522) > at > org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1433) > at > org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1396) > at > org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:2080) > at > org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1920) > at > org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1682) > at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:896) > at > org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:752) > at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:723) > {noformat} > We saw a few GSSException in the NN log, but only one threw the > ConcurrentModificationException. This NN had a failover, which is caused by > ZKFC having GSSException too. Suspect it's related issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org