ZeonHuang opened a new issue, #14439: URL: https://github.com/apache/dolphinscheduler/issues/14439
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues. ### What happened I have been using **Dolphinscheduler Cluster** for serveral months, I find that the worker-server always failed to **authenticate with kerberos** in periodly and need to be restart to solve this problem. Below is details: Application Version: **3.0.2** Run mode: Cluster HDFS: Built by CDH5, version should be 2.6.0 **Kerberos config in worker-server:** ``` # whether to startup kerberos hadoop.security.authentication.startup.state=true # java.security.krb5.conf path java.security.krb5.conf.path=/etc/krb5.conf # login user from keytab username login.user.keytab.username=hdfs/[email protected] # login user from keytab path login.user.keytab.path=/etc/security/keytab/researchdata24.keytab # kerberos expire time, the unit is hour kerberos.expire.time=1 ``` ### What you expected to happen Issue log: > java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "researchdata24/10.189.24.24"; destination host is: "researchdata30":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776) > at org.apache.hadoop.ipc.Client.call(Client.java:1479) > at org.apache.hadoop.ipc.Client.call(Client.java:1412) > at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy94.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771) > at sun.reflect.GeneratedMethodAccessor183.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy95.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108) > at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305) > at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301) > at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:464) > at org.apache.dolphinscheduler.common.utils.HadoopUtils.copyHdfsToLocal(HadoopUtils.java:390) > at org.apache.dolphinscheduler.common.utils.HadoopUtils.download(HadoopUtils.java:319) > at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.downloadResource(TaskExecuteThread.java:322) > at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:173) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) > at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) > at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] > at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:687) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:650) > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:737) > at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528) > at org.apache.hadoop.ipc.Client.call(Client.java:1451) > ... 27 common frames omitted > Caused by: javax.security.sasl.SaslException: GSS initiate failed > at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:414) > at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:560) > at org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:375) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:729) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:725) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:725) > ... 30 common frames omitted > Caused by: org.ietf.jgss.GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) > at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147) > at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122) > at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187) > at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224) > at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212) > at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179) > at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192) ### How to reproduce It is normally and periodly can be found in cluster mode with kerberos authtication. And I need to restart the worker-server to deal with the problem, which is not advised. ### Anything else _No response_ ### Version 3.1.x ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
