ZeonHuang opened a new issue, #14439:
URL: https://github.com/apache/dolphinscheduler/issues/14439

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What happened
   
   I have been using **Dolphinscheduler Cluster** for serveral months, I find 
that the worker-server always failed to **authenticate with kerberos** in 
periodly and need to be restart to solve this problem.  Below is details:
   Application Version: **3.0.2**
   Run mode: Cluster
   HDFS: Built by CDH5, version should be 2.6.0
   
   **Kerberos config in worker-server:**
   
   ```
   # whether to startup kerberos
   hadoop.security.authentication.startup.state=true
   
   # java.security.krb5.conf path
   java.security.krb5.conf.path=/etc/krb5.conf
   
   # login user from keytab username
   login.user.keytab.username=hdfs/[email protected]
   
   # login user from keytab path
   login.user.keytab.path=/etc/security/keytab/researchdata24.keytab
   
   # kerberos expire time, the unit is hour
   kerberos.expire.time=1
   ```
   
   ### What you expected to happen
   
   Issue log: 
   
   > java.io.IOException: Failed on local exception: java.io.IOException: 
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]; Host Details : local host is: "researchdata24/10.189.24.24"; destination 
host is: "researchdata30":8020; 
   >         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776)
   >         at org.apache.hadoop.ipc.Client.call(Client.java:1479)
   >         at org.apache.hadoop.ipc.Client.call(Client.java:1412)
   >         at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
   >         at com.sun.proxy.$Proxy94.getFileInfo(Unknown Source)
   >         at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
   >         at sun.reflect.GeneratedMethodAccessor183.invoke(Unknown Source)
   >         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   >         at java.lang.reflect.Method.invoke(Method.java:498)
   >         at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
   >         at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   >         at com.sun.proxy.$Proxy95.getFileInfo(Unknown Source)
   >         at 
org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
   >         at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
   >         at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
   >         at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
   >         at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
   >         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:464)
   >         at 
org.apache.dolphinscheduler.common.utils.HadoopUtils.copyHdfsToLocal(HadoopUtils.java:390)
   >         at 
org.apache.dolphinscheduler.common.utils.HadoopUtils.download(HadoopUtils.java:319)
   >         at 
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.downloadResource(TaskExecuteThread.java:322)
   >         at 
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:173)
   >         at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   >         at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
   >         at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
   >         at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
   >         at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   >         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   >         at java.lang.Thread.run(Thread.java:748)
   > Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS 
initiate failed [Caused by GSSException: No valid credentials provided 
(Mechanism level: Failed to find any Kerberos tgt)]
   >         at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:687)
   >         at java.security.AccessController.doPrivileged(Native Method)
   >         at javax.security.auth.Subject.doAs(Subject.java:422)
   >         at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
   >         at 
org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:650)
   >         at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:737)
   >         at 
org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
   >         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
   >         at org.apache.hadoop.ipc.Client.call(Client.java:1451)
   >         ... 27 common frames omitted
   > Caused by: javax.security.sasl.SaslException: GSS initiate failed
   >         at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
   >         at 
org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:414)
   >         at 
org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:560)
   >         at 
org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:375)
   >         at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:729)
   >         at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:725)
   >         at java.security.AccessController.doPrivileged(Native Method)
   >         at javax.security.auth.Subject.doAs(Subject.java:422)
   >         at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
   >         at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:725)
   >         ... 30 common frames omitted
   > Caused by: org.ietf.jgss.GSSException: No valid credentials provided 
(Mechanism level: Failed to find any Kerberos tgt)
   >         at 
sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
   >         at 
sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
   >         at 
sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
   >         at 
sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
   >         at 
sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
   >         at 
sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
   >         at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
   
   ### How to reproduce
   
   It is normally and periodly can be found in  cluster mode with kerberos 
authtication.  And I need to restart the worker-server to deal with the 
problem, which is not advised.
   
   
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   3.1.x
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to