[
https://issues.apache.org/jira/browse/FALCON-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Balu Vellanki updated FALCON-1595:
----------------------------------
Summary: In secure cluster, Falcon server loses ability to communicate with
HDFS over time (was: Falcon server loses ability to communicate with HDFS over
time)
> In secure cluster, Falcon server loses ability to communicate with HDFS over
> time
> ---------------------------------------------------------------------------------
>
> Key: FALCON-1595
> URL: https://issues.apache.org/jira/browse/FALCON-1595
> Project: Falcon
> Issue Type: Bug
> Affects Versions: 0.8
> Reporter: Balu Vellanki
> Assignee: Balu Vellanki
> Fix For: 0.9
>
> Attachments: FALCON-1595.patch
>
>
> In a kerberos secured cluster where the Kerberos ticket validity is one day,
> Falcon server eventually lost the ability to read and write to and from HDFS.
> In the logs we saw typical Kerberos-related errors like "GSSException: No
> valid credentials provided (Mechanism level: Failed to find any Kerberos
> tgt)".
> {code}
> 2015-10-28 00:04:59,517 INFO - [LaterunHandler:] ~ Creating FS impersonating
> user testUser (HadoopClientFactory:197)
> 2015-10-28 00:04:59,519 WARN - [LaterunHandler:] ~ Exception encountered
> while connecting to the server : javax.security.sasl.SaslException: GSS
> initiate failed [Caused by GSSException: No valid credentials provided
> (Mechanism level: Failed to find any Kerberos tgt)] (Client:680)
> 2015-10-28 00:04:59,520 WARN - [LaterunHandler:] ~ Late Re-run failed for
> instance sample-process:2015-10-28T03:58Z after 420000
> (AbstractRerunConsumer:84)
> java.io.IOException: Failed on local exception: java.io.IOException:
> javax.security.sasl.SaslException: GSS initiate failed [Caused by
> GSSException: No valid credentials provided (Mechanism level: Failed to find
> any Kerberos tgt)]; Host Details : local host is:
> "sample.host.com/127.0.0.1"; destination host is: "sample.host.com":8020;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
> at org.apache.hadoop.ipc.Client.call(Client.java:1431)
> at org.apache.hadoop.ipc.Client.call(Client.java:1358)
> ...
> Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS
> initiate failed [Caused by GSSException: No valid credentials provided
> (Mechanism level: Failed to find any Kerberos tgt)]
> at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:685)
> ...
> {code}
> The root cause of the issue is that TGT can expire. The TGT should be valid
> when accessing the namenode/a kerberos protected server and not when doing
> uri.getAuthority(). The best location in code to do this is in
> HadoopClientFactory.createFileSystem(...)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)