[
https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17651611#comment-17651611
]
ASF GitHub Bot commented on HADOOP-18581:
-----------------------------------------
surendralilhore commented on code in PR #5248:
URL: https://github.com/apache/hadoop/pull/5248#discussion_r1056248810
##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java:
##########
@@ -2206,7 +2206,25 @@ private void saslProcess(RpcSaslProto saslMessage)
AUDITLOG.warn(AUTH_FAILED_FOR + this.toString() + ":"
+ attemptingUser + " (" + e.getLocalizedMessage()
+ ") with true cause: (" + tce.getLocalizedMessage() + ")");
- throw tce;
+ if (!UserGroupInformation.getLoginUser().isLoginSuccess()) {
+ LOG.info("Initiating re-login from IPC Server");
+ if (UserGroupInformation.isLoginKeytabBased()) {
+ UserGroupInformation.getLoginUser().reloginFromKeytab();
Review Comment:
> Would that still leave a server potentially in a bad state for up to 60
seconds?
Yes, for 60 seconds server will in bad state. Earlier only option was to
restart the server.
Below is the test log for 60 second from my test cluster, after 60 second it
is successfully logged-in :
```
2022-12-23 10:27:19,117 INFO ipc.Server - Auth successful for
hive/[email protected] (auth:KERBEROS)
2022-12-23 10:27:19,121 INFO authorize.ServiceAuthorizationManager -
Authorization successful for hive/[email protected]
(auth:KERBEROS) for protocol=interface
org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-12-23 10:27:27,048 ERROR namenode.NameNode - Dummy logout thread...
org.apache.hadoop.security.KerberosAuthException: Login failure for user:
nn/[email protected] javax.security.auth.login.LoginException:
Re-login failed
at
org.apache.hadoop.security.UserGroupInformation.unprotectedRelogin(UserGroupInformation.java:1203)
at
org.apache.hadoop.hdfs.server.namenode.NameNode$2.run(NameNode.java:1590)
Caused by: javax.security.auth.login.LoginException: Re-login failed
at
org.apache.hadoop.security.UserGroupInformation.unprotectedRelogin(UserGroupInformation.java:1188)
... 1 more
2022-12-23 10:27:28,786 WARN ipc.Server - Auth failed for
10.x.y.z:46879:null (GSS initiate failed) with true cause: (GSS initiate failed)
2022-12-23 10:27:28,786 INFO ipc.Server - Initiating re-login from IPC
Server
2022-12-23 10:27:28,786 INFO ipc.Server - Doing login from keytab
2022-12-23 10:27:28,786 WARN security.UserGroupInformation - Not attempting
to re-login since the last re-login was attempted less than 60 seconds before.
Last Login=1671791247048
.
.
.
.
.
2022-12-23 10:28:27,618 WARN ipc.Server - Auth failed for
10.x.y.z:45329:null (GSS initiate failed) with true cause: (GSS initiate failed)
2022-12-23 10:28:27,619 INFO ipc.Server - Initiating re-login from IPC
Server
2022-12-23 10:28:27,619 INFO ipc.Server - Doing login from keytab
2022-12-23 10:28:27,652 INFO ipc.Server - Retry Auth successful for
10.x.y.z:45329:null after failure
2022-12-23 10:28:27,655 INFO ipc.Server - Auth successful for
hive/[email protected] (auth:KERBEROS)
2022-12-23 10:28:27,667 INFO authorize.ServiceAuthorizationManager -
Authorization successful for hive/[email protected]
(auth:KERBEROS) for protocol=interface
org.apache.hadoop.hdfs.protocol.ClientProtocol
```
> Handle Server KDC re-login when Server and Client run in same JVM.
> ------------------------------------------------------------------
>
> Key: HADOOP-18581
> URL: https://issues.apache.org/jira/browse/HADOOP-18581
> Project: Hadoop Common
> Issue Type: Bug
> Affects Versions: 3.1.1
> Reporter: Surendra Singh Lilhore
> Assignee: Surendra Singh Lilhore
> Priority: Major
> Labels: pull-request-available
>
> Handle re-login in Server when client, server running in same JVM and client
> trying to re-login, but it fails.
> For example, NameNode is server but in same JVM journal node client also
> running to push to edit logs. When JN client try to re-login and it fails, it
> will destroy server service ticket also and NameNode not able to server
> client request. We can see the below error logs in NameNode log file.
>
> {noformat}
> Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause:
> (GSS initiate failed)
> Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause:
> (GSS initiate failed)
> Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause:
> (GSS initiate failed){noformat}
> Same discussion happened in HADOOP-17996.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]