[ https://issues.apache.org/jira/browse/ZOOKEEPER-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389792#comment-17389792 ]
Damien Diederen commented on ZOOKEEPER-4334: -------------------------------------------- [~ekleszcz] wrote: bq. that won't solve the problem as the change considers only the SASL auth between the quorum members and my case regards the Java client to server auth. Ah, right; I just spotted "the quorum member's saslToken is null," saw that you were using keytabs, assumed this was about quorum auth, and thought I'd mention ZOOKEEPER-4030. bq. I have just discovered the extra flag: {{zookeeper.sasl.client.canonicalize.hostname}}. This means that by default we have to strictly use the canonical names for the principals. What I would like to achieve instead is to define the aliases in the principals. \[…\] Tested and it keeps failing \[…\] Right. As [~eolivelli] mentions, Kerberos implementations tend to be bound to "real" names, as returned by reverse DNS resolution. ZooKeeper \(client-to-server, and now server-to-server) supports referencing members using aliases, but the correct tickets still have to be provided. My understanding is that this is a Kerberos limitation, not a ZooKeeper issue. You are of course welcome to suggest a workaround if you find one, but I would otherwise suggest amending or closing this ticket. > SASL authentication fails when using host aliases > ------------------------------------------------- > > Key: ZOOKEEPER-4334 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4334 > Project: ZooKeeper > Issue Type: Bug > Affects Versions: 3.6.1 > Reporter: Emil Kleszcz > Priority: Critical > > I faced an issue while trying to use alternative aliases with Zookeeper > quorum when SASL is enabled. The errors I get in zookeeper log are the > following: > ``` > 2021-07-12 21:04:46,437 [myid:3] - WARN > [NIOWorkerThread-3:ZooKeeperServer@1661] - Client /<IP addr>:37368 failed to > SASL authenticate: {} > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum > failed)] > at > com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:199) > at > org.apache.zookeeper.server.ZooKeeperSaslServer.evaluateResponse(ZooKeeperSaslServer.java:49) > at > org.apache.zookeeper.server.ZooKeeperServer.processSasl(ZooKeeperServer.java:1650) > at > org.apache.zookeeper.server.ZooKeeperServer.processPacket(ZooKeeperServer.java:1599) > at > org.apache.zookeeper.server.NIOServerCnxn.readRequest(NIOServerCnxn.java:379) > at > org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:182) > at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:339) > at > org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522) > at > org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism > level: Checksum failed) > at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:856) > at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342) > at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285) > at > com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:167) > ... 11 more > Caused by: KrbException: Checksum failed > at > sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:102) > at > sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:94) > at sun.security.krb5.EncryptedData.decrypt(EncryptedData.java:175) > at sun.security.krb5.KrbApReq.authenticate(KrbApReq.java:281) > at sun.security.krb5.KrbApReq.<init>(KrbApReq.java:149) > at > sun.security.jgss.krb5.InitSecContextToken.<init>(InitSecContextToken.java:108) > at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:829) > ... 14 more > Caused by: java.security.GeneralSecurityException: Checksum failed > at > sun.security.krb5.internal.crypto.dk.AesDkCrypto.decryptCTS(AesDkCrypto.java:451) > at > sun.security.krb5.internal.crypto.dk.AesDkCrypto.decrypt(AesDkCrypto.java:272) > at sun.security.krb5.internal.crypto.Aes256.decrypt(Aes256.java:76) > at > sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:100) > ... 20 more > ``` > What did I do? > 1) created host aliases for each quorum node (a,b,c): zk1, zk2, zk3 > 2) Changed in zoo.cfg: > changed from > server.1=a > server.2=b > server.3=c > to: > server.1=zk1 > server.2=zk2 > server.3=zk3 > (at this stage after restarting the ensemble all works as expected. > 3) Generate new keytab with alias-based principals and host-based principals > in zookeeper.keytab > 4) Change jaas.conf (server) definition from: > Server > { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true > keyTab="/etc/zookeeper/conf/zookeeper.keytab" storeKey=true > useTicketCache=false principal="zookeeper/a.com@COM"; } > ; > to > Server > { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true > keyTab="/etc/zookeeper/conf/zookeeper.keytab" storeKey=true > useTicketCache=false principal="zookeeper/zk1.com@COM"; } > ; > From that moment, after restarting quorum members, I get the above error. > Now, why do I do this? > To allow other services such as zkfc,hbase,hdfs,yarn to connect to the > quorum using aliases. Interestingly, without changing the zookeeper > principal, hbase works perfectly, but the other 3 services fail with: > ``` > <2021-07-12T20:45:19.491+0200> <INFO> <org.apache.zookeeper.ZooKeeper>: > <Initiating client connection, > connectString=zk01.com:2181,zk02.com:2181,zk03.com:2181 sessionTimeout=10000 > watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3246fb96> > <2021-07-12T20:45:19.519+0200> <INFO> <org.apache.zookeeper.Login>: <Client > successfully logged in.> > <2021-07-12T20:45:19.521+0200> <INFO> <org.apache.zookeeper.Login>: <TGT > refresh thread started.> > <2021-07-12T20:45:19.524+0200> <INFO> <org.apache.zookeeper.Login>: <TGT > valid starting at: Mon Jul 12 20:45:19 CEST 2021> > <2021-07-12T20:45:19.524+0200> <INFO> <org.apache.zookeeper.Login>: <TGT > expires: Tue Jul 13 21:45:19 CEST 2021> > <2021-07-12T20:45:19.524+0200> <INFO> <org.apache.zookeeper.Login>: <TGT > refresh sleeping until: Tue Jul 13 17:05:16 CEST 2021> > <2021-07-12T20:45:19.524+0200> <INFO> > <org.apache.zookeeper.client.ZooKeeperSaslClient>: <Client will use GSSAPI as > SASL mechanism.> > <2021-07-12T20:45:19.530+0200> <INFO> <org.apache.zookeeper.ClientCnxn>: > <Opening socket connection to server zk02.com/<ip addr>:2181. Will attempt to > SASL-authenticate using Login Context section 'Client'> > <2021-07-12T20:45:19.535+0200> <INFO> <org.apache.zookeeper.ClientCnxn>: > <Socket connection established to zk02.com/<ip addr>:2181, initiating session> > <2021-07-12T20:45:19.543+0200> <INFO> <org.apache.zookeeper.ClientCnxn>: > <Session establishment complete on server zk02.com/<ip addr>:2181, sessionid > = 0x200247870fb0007, negotiated timeout = 10000> > <2021-07-12T20:45:19.561+0200> <ERROR> > <org.apache.zookeeper.client.ZooKeeperSaslClient>: <SASL authentication > failed using login context 'Client' with exception: {}> > javax.security.sasl.SaslException: Error in authenticating with a Zookeeper > Quorum member: the quorum member's saslToken is null. > at > org.apache.zookeeper.client.ZooKeeperSaslClient.createSaslToken(ZooKeeperSaslClient.java:279) > at > org.apache.zookeeper.client.ZooKeeperSaslClient.respondToServer(ZooKeeperSaslClient.java:242) > at > org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:805) > at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:94) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1145) > <2021-07-12T20:45:19.564+0200> <INFO> <org.apache.zookeeper.ClientCnxn>: > <Unable to read additional data from server sessionid 0x200247870fb0007, > likely server has closed socket, closing socket connection and attempting > reconnect> > <2021-07-12T20:45:19.671+0200> <INFO> > <org.apache.hadoop.ha.ActiveStandbyElector>: <Session connected.> > <2021-07-12T20:45:19.672+0200> <ERROR> > <org.apache.hadoop.hdfs.tools.DFSZKFailoverController>: > <DFSZKFailOverController exiting due to earlier exception > java.io.IOException: Couldn't determine existence of znode > ``` > When I change the principle of zookeeper hbase starts failing with this > error and other services except for the zookeeper itself is somehow working > fine. After that, I cannot connect manually to the zk quorum using zkCli and > zookeeper-client with all possible combinations of principals. > I wonder if that may have something to do with the "Server > environment:host.name=" pointing to the canonical name (and not the alias) > during the startup. The same happens after specifying the alias with > clientPortAddress=. -- This message was sent by Atlassian Jira (v8.3.4#803005)