[ 
https://issues.apache.org/jira/browse/FALCON-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034574#comment-15034574
 ] 

Balu Vellanki commented on FALCON-1595:
---------------------------------------

[~sowmyaramesh] : I updated the Jira description to explain the root cause. 
Relogin in AuthenticationInitializationService does not handle this case, 
[~venkatnrangan] explained that the TGT should be valid when accessing the 
namenode/a kerberos  protected server and not when doing uri.getAuthority() or 
all doAs operations where we are not making RPC/HTTP calls.

> Falcon server loses ability to communicate with HDFS over time
> --------------------------------------------------------------
>
>                 Key: FALCON-1595
>                 URL: https://issues.apache.org/jira/browse/FALCON-1595
>             Project: Falcon
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Balu Vellanki
>            Assignee: Balu Vellanki
>             Fix For: 0.9
>
>         Attachments: FALCON-1595.patch
>
>
> In a kerberos secured cluster where the Kerberos ticket validity is one day, 
> Falcon server eventually lost the ability to read and write to and from HDFS. 
> In the logs we saw typical Kerberos-related errors like "GSSException: No 
> valid credentials provided (Mechanism level: Failed to find any Kerberos 
> tgt)". 
> {code}
> 2015-10-28 00:04:59,517 INFO  - [LaterunHandler:] ~ Creating FS impersonating 
> user testUser (HadoopClientFactory:197)
> 2015-10-28 00:04:59,519 WARN  - [LaterunHandler:] ~ Exception encountered 
> while connecting to the server : javax.security.sasl.SaslException: GSS 
> initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Failed to find any Kerberos tgt)] (Client:680)
> 2015-10-28 00:04:59,520 WARN  - [LaterunHandler:] ~ Late Re-run failed for 
> instance sample-process:2015-10-28T03:58Z after 420000 
> (AbstractRerunConsumer:84)
> java.io.IOException: Failed on local exception: java.io.IOException: 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]; Host Details : local host is: 
> "sample.host.com/127.0.0.1"; destination host is: "sample.host.com":8020; 
>       at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1431)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1358)
>       ...
> Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS 
> initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Failed to find any Kerberos tgt)]
>       at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:685)
>       ...
> {code}
> The root cause of the issue is that TGT can expire. The TGT should be valid 
> when accessing the namenode/a kerberos  protected server and not when doing 
> uri.getAuthority(). The best location in code to do this is in 
> HadoopClientFactory.createFileSystem(...) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to