[ https://issues.apache.org/jira/browse/ZOOKEEPER-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379813#comment-17379813 ]
Emil Kleszcz commented on ZOOKEEPER-4334: ----------------------------------------- Thanks for your prompt reply [~eolivelli]. It seems that you are correct since I couldn't make it running fully with the aliases used in the keytab principals. If that's not supported I would suggest adding a note to the ZK admin guide about that and if it is supposed to run we could double-check why it's working only partially. By partially, I mean the aliases are working with server.X but not when external SSL is enabled. I forgot to mention I am using auth_to_local combined, yet the alias-based hosts are listed there too. > SASL authentication fails when using host aliases > ------------------------------------------------- > > Key: ZOOKEEPER-4334 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4334 > Project: ZooKeeper > Issue Type: Bug > Affects Versions: 3.6.1 > Reporter: Emil Kleszcz > Priority: Critical > > I faced an issue while trying to use alternative aliases with Zookeeper > quorum when SASL is enabled. The errors I get in zookeeper log are the > following: > ``` > 2021-07-12 21:04:46,437 [myid:3] - WARN > [NIOWorkerThread-3:ZooKeeperServer@1661] - Client /<IP addr>:37368 failed to > SASL authenticate: {} > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum > failed)] > at > com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:199) > at > org.apache.zookeeper.server.ZooKeeperSaslServer.evaluateResponse(ZooKeeperSaslServer.java:49) > at > org.apache.zookeeper.server.ZooKeeperServer.processSasl(ZooKeeperServer.java:1650) > at > org.apache.zookeeper.server.ZooKeeperServer.processPacket(ZooKeeperServer.java:1599) > at > org.apache.zookeeper.server.NIOServerCnxn.readRequest(NIOServerCnxn.java:379) > at > org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:182) > at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:339) > at > org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522) > at > org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism > level: Checksum failed) > at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:856) > at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342) > at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285) > at > com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:167) > ... 11 more > Caused by: KrbException: Checksum failed > at > sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:102) > at > sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:94) > at sun.security.krb5.EncryptedData.decrypt(EncryptedData.java:175) > at sun.security.krb5.KrbApReq.authenticate(KrbApReq.java:281) > at sun.security.krb5.KrbApReq.<init>(KrbApReq.java:149) > at > sun.security.jgss.krb5.InitSecContextToken.<init>(InitSecContextToken.java:108) > at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:829) > ... 14 more > Caused by: java.security.GeneralSecurityException: Checksum failed > at > sun.security.krb5.internal.crypto.dk.AesDkCrypto.decryptCTS(AesDkCrypto.java:451) > at > sun.security.krb5.internal.crypto.dk.AesDkCrypto.decrypt(AesDkCrypto.java:272) > at sun.security.krb5.internal.crypto.Aes256.decrypt(Aes256.java:76) > at > sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:100) > ... 20 more > ``` > What did I do? > 1) created host aliases for each quorum node (a,b,c): zk1, zk2, zk3 > 2) Changed in zoo.cfg: > changed from > server.1=a > server.2=b > server.3=c > to: > server.1=zk1 > server.2=zk2 > server.3=zk3 > (at this stage after restarting the ensemble all works as expected. > 3) Generate new keytab with alias-based principals and host-based principals > in zookeeper.keytab > 4) Change jaas.conf (server) definition from: > Server > { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true > keyTab="/etc/zookeeper/conf/zookeeper.keytab" storeKey=true > useTicketCache=false principal="zookeeper/a.com@COM"; } > ; > to > Server > { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true > keyTab="/etc/zookeeper/conf/zookeeper.keytab" storeKey=true > useTicketCache=false principal="zookeeper/zk1.com@COM"; } > ; > From that moment, after restarting quorum members, I get the above error. > Now, why do I do this? > To allow other services such as zkfc,hbase,hdfs,yarn to connect to the > quorum using aliases. Interestingly, without changing the zookeeper > principal, hbase works perfectly, but the other 3 services fail with: > ``` > <2021-07-12T20:45:19.491+0200> <INFO> <org.apache.zookeeper.ZooKeeper>: > <Initiating client connection, > connectString=zk01.com:2181,zk02.com:2181,zk03.com:2181 sessionTimeout=10000 > watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3246fb96> > <2021-07-12T20:45:19.519+0200> <INFO> <org.apache.zookeeper.Login>: <Client > successfully logged in.> > <2021-07-12T20:45:19.521+0200> <INFO> <org.apache.zookeeper.Login>: <TGT > refresh thread started.> > <2021-07-12T20:45:19.524+0200> <INFO> <org.apache.zookeeper.Login>: <TGT > valid starting at: Mon Jul 12 20:45:19 CEST 2021> > <2021-07-12T20:45:19.524+0200> <INFO> <org.apache.zookeeper.Login>: <TGT > expires: Tue Jul 13 21:45:19 CEST 2021> > <2021-07-12T20:45:19.524+0200> <INFO> <org.apache.zookeeper.Login>: <TGT > refresh sleeping until: Tue Jul 13 17:05:16 CEST 2021> > <2021-07-12T20:45:19.524+0200> <INFO> > <org.apache.zookeeper.client.ZooKeeperSaslClient>: <Client will use GSSAPI as > SASL mechanism.> > <2021-07-12T20:45:19.530+0200> <INFO> <org.apache.zookeeper.ClientCnxn>: > <Opening socket connection to server zk02.com/<ip addr>:2181. Will attempt to > SASL-authenticate using Login Context section 'Client'> > <2021-07-12T20:45:19.535+0200> <INFO> <org.apache.zookeeper.ClientCnxn>: > <Socket connection established to zk02.com/<ip addr>:2181, initiating session> > <2021-07-12T20:45:19.543+0200> <INFO> <org.apache.zookeeper.ClientCnxn>: > <Session establishment complete on server zk02.com/<ip addr>:2181, sessionid > = 0x200247870fb0007, negotiated timeout = 10000> > <2021-07-12T20:45:19.561+0200> <ERROR> > <org.apache.zookeeper.client.ZooKeeperSaslClient>: <SASL authentication > failed using login context 'Client' with exception: {}> > javax.security.sasl.SaslException: Error in authenticating with a Zookeeper > Quorum member: the quorum member's saslToken is null. > at > org.apache.zookeeper.client.ZooKeeperSaslClient.createSaslToken(ZooKeeperSaslClient.java:279) > at > org.apache.zookeeper.client.ZooKeeperSaslClient.respondToServer(ZooKeeperSaslClient.java:242) > at > org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:805) > at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:94) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1145) > <2021-07-12T20:45:19.564+0200> <INFO> <org.apache.zookeeper.ClientCnxn>: > <Unable to read additional data from server sessionid 0x200247870fb0007, > likely server has closed socket, closing socket connection and attempting > reconnect> > <2021-07-12T20:45:19.671+0200> <INFO> > <org.apache.hadoop.ha.ActiveStandbyElector>: <Session connected.> > <2021-07-12T20:45:19.672+0200> <ERROR> > <org.apache.hadoop.hdfs.tools.DFSZKFailoverController>: > <DFSZKFailOverController exiting due to earlier exception > java.io.IOException: Couldn't determine existence of znode > ``` > When I change the principle of zookeeper hbase starts failing with this > error and other services except for the zookeeper itself is somehow working > fine. After that, I cannot connect manually to the zk quorum using zkCli and > zookeeper-client with all possible combinations of principals. > I wonder if that may have something to do with the "Server > environment:host.name=" pointing to the canonical name (and not the alias) > during the startup. The same happens after specifying the alias with > clientPortAddress=. -- This message was sent by Atlassian Jira (v8.3.4#803005)