[
https://issues.apache.org/jira/browse/HADOOP-15378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wei-Chiu Chuang updated HADOOP-15378:
-------------------------------------
Affects Version/s: 2.6.0
> Hadoop client unable to relogin because a remote DataNode has an incorrect
> krb5.conf
> ------------------------------------------------------------------------------------
>
> Key: HADOOP-15378
> URL: https://issues.apache.org/jira/browse/HADOOP-15378
> Project: Hadoop Common
> Issue Type: Bug
> Components: security
> Affects Versions: 2.6.0
> Environment: CDH5.8.3, Kerberized, Impala
> Reporter: Wei-Chiu Chuang
> Priority: Critical
>
> This is a very weird bug.
> We received a report where a Hadoop client (Impala Catalog server) failed to
> relogin and crashed every several hours. Initial indication suggested the
> symptom matched HADOOP-13433.
> But after we patched HADOOP-13433 (as well as HADOOP-15143), Impala Catalog
> server still kept crashing.
>
> {noformat}
> W0114 05:49:24.676743 41444 UserGroupInformation.java:1838]
> PriviledgedActionException as:impala/[email protected]
> (auth:KERBEROS)
> cause:org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException):
> Failure to initialize security context
> W0114 05:49:24.680363 41444 UserGroupInformation.java:1137] The first
> kerberos ticket is not TGT(the server principal is
> hdfs/[email protected]), remove and destroy it.
> W0114 05:49:24.680501 41444 UserGroupInformation.java:1137] The first
> kerberos ticket is not TGT(the server principal is
> hdfs/[email protected]), remove and destroy it.
> W0114 05:49:24.680593 41444 UserGroupInformation.java:1153] Warning, no
> kerberos ticket found while attempting to renew ticket{noformat}
> The error “Failure to initialize security context” is suspicious here.
> Catalogd was unable to log in because of a Kerberos issue. The JDK expects
> the first kerberos ticket of a principal to be a TGT, however it seems that
> after this error, because it was unable to login successfully, the first
> ticket was no longer a TGT. The patch HADOOP-13433 removed other tickets of
> the principal, because it expects the TGT to be in the principal’s ticket,
> which is untrue in this case. So finally, it removed all tickets.
> And then
> {noformat}
> W0114 05:49:24.681946 41443 UserGroupInformation.java:1838]
> PriviledgedActionException as:impala/[email protected]
> (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed
> [Caused by GSSException: No valid credentials provided (Mechanism level:
> Failed to find any Kerberos tgt)]
> {noformat}
> The error “Failed to find any Kerberos tgt” is typically an indication that
> the user’s Kerberos ticket has expired. However, that’s definitely not the
> case here, since it was just a little over 8 hours.
> After we patched HADOOP-13433, the error handling code exhibited NPE, as
> reported in HADOOP-15143.
>
> {code:java}
> I0114 05:50:26.758565 6384 RetryInvocationHandler.java:148] Exception while
> invoking listCachePools of class ClientNamenodeProtocolTranslatorPB over
> host4.example.com/10.0.121.66:8020 after 2 fail over attempts. Trying to fail
> over immediately. Java exception follows: java.io.IOException: Failed on
> local exception: java.io.IOException: Couldn't set up IO streams; Host
> Details : local host is: "host1.example.com/10.0.121.45"; destination host
> is: "host4.example.com":8020; at
> org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) at
> org.apache.hadoop.ipc.Client.call(Client.java:1506) at
> org.apache.hadoop.ipc.Client.call(Client.java:1439) at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy9.listCachePools(Unknown Source) at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.listCachePools(ClientNamenodeProtocolTranslatorPB.java:1261)
> at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498) at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
> at com.sun.proxy.$Proxy10.listCachePools(Unknown Source) at
> org.apache.hadoop.hdfs.protocol.CachePoolIterator.makeRequest(CachePoolIterator.java:55)
> at
> org.apache.hadoop.hdfs.protocol.CachePoolIterator.makeRequest(CachePoolIterator.java:33)
> at
> org.apache.hadoop.fs.BatchedRemoteIterator.makeRequest(BatchedRemoteIterator.java:77)
> at
> org.apache.hadoop.fs.BatchedRemoteIterator.makeRequestIfNeeded(BatchedRemoteIterator.java:85)
> at
> org.apache.hadoop.fs.BatchedRemoteIterator.hasNext(BatchedRemoteIterator.java:99)
> at
> com.cloudera.impala.catalog.CatalogServiceCatalog$CachePoolReader.run(CatalogServiceCatalog.java:193)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException:
> Couldn't set up IO streams at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:826) at
> org.apache.hadoop.ipc.Client$Connection.access$3000(Client.java:396) at
> org.apache.hadoop.ipc.Client.getConnection(Client.java:1555) at
> org.apache.hadoop.ipc.Client.call(Client.java:1478) ... 23 more Caused by:
> java.lang.NullPointerException at
> org.apache.hadoop.security.UserGroupInformation.fixKerberosTicketOrder(UserGroupInformation.java:1136)
> at
> org.apache.hadoop.security.UserGroupInformation.reloginFromTicketCache(UserGroupInformation.java:1272)
> at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:697) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:422) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1835)
> at
> org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:681)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:769)
> ... 26 more
> {code}
>
> In any case, HADOOP-15143 does not fix the problem, since valid credentials
> were already removed from the user.
> Also, even after HADOOP-13433 was applied, I still saw the following error:
> {noformat}
> W0113 14:33:44.254727 1255277 UserGroupInformation.java:1838]
> PriviledgedActionException as:impala/[email protected]
> (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed
> [Caused by GSSException: No valid credentials provided (Mechanism level: The
> ticket isn't for us (35) - BAD TGS SERVER NAME)]
> {noformat}
>
> We traced the SaslException back to a particular remote DataNode. Finally, it
> turns out that the DataNode's host has an incorrect krb5.conf, which points
> to a decommissioned KDC, inconsistent with the rest of the hosts in the
> cluster. Upon authentication, it seems the error handling for the
> SaslException has a race condition, and occasionally it removes valid
> credential from the UGI. When we corrected the krb5.conf on that host, this
> error disappeared.
> This is not easy to reproduce, but we are told this issue seems to occur
> after Impala user performs a "refresh" command. Presumably this command
> forces Impala catalog server to connect to the problematic DataNode and
> triggered the buggy code.
>
> I don't have a patch to fix it but I want to raise this issue for discussion.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]