[ 
https://issues.apache.org/jira/browse/FALCON-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balu Vellanki updated FALCON-1595:
----------------------------------
    Description: 
In a kerberos secured cluster where the Kerberos ticket validity is one day, 
Falcon server eventually lost the ability to read and write to and from HDFS. 
In the logs we saw typical Kerberos-related errors like "GSSException: No valid 
credentials provided (Mechanism level: Failed to find any Kerberos tgt)". 

{code}
2015-10-28 00:04:59,517 INFO  - [LaterunHandler:] ~ Creating FS impersonating 
user testUser (HadoopClientFactory:197)
2015-10-28 00:04:59,519 WARN  - [LaterunHandler:] ~ Exception encountered while 
connecting to the server : javax.security.sasl.SaslException: GSS initiate 
failed [Caused by GSSException: No valid credentials provided (Mechanism level: 
Failed to find any Kerberos tgt)] (Client:680)
2015-10-28 00:04:59,520 WARN  - [LaterunHandler:] ~ Late Re-run failed for 
instance sample-process:2015-10-28T03:58Z after 420000 
(AbstractRerunConsumer:84)
java.io.IOException: Failed on local exception: java.io.IOException: 
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]; Host Details : local host is: "sample.host.com/127.0.0.1"; destination 
host is: "sample.host.com":8020; 
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
        at org.apache.hadoop.ipc.Client.call(Client.java:1431)
        at org.apache.hadoop.ipc.Client.call(Client.java:1358)
        ...
Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate 
failed [Caused by GSSException: No valid credentials provided (Mechanism level: 
Failed to find any Kerberos tgt)]
        at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:685)
        ...
{code}

The root cause of the issue is that TGT can expire and the TGT issued by 
authentication server should be renewed periodically. The best location in code 
to do this is in HadoopClientFactory.getFileSystem 

  was:
In a kerberos secured cluster where the Kerberos ticket validity is one day, 
Falcon server eventually lost the ability to read and write to and from HDFS. 
In the logs we saw typical Kerberos-related errors like "GSSException: No valid 
credentials provided (Mechanism level: Failed to find any Kerberos tgt)". 

{code}
2015-10-28 00:04:59,517 INFO  - [LaterunHandler:] ~ Creating FS impersonating 
user testUser (HadoopClientFactory:197)
2015-10-28 00:04:59,519 WARN  - [LaterunHandler:] ~ Exception encountered while 
connecting to the server : javax.security.sasl.SaslException: GSS initiate 
failed [Caused by GSSException: No valid credentials provided (Mechanism level: 
Failed to find any Kerberos tgt)] (Client:680)
2015-10-28 00:04:59,520 WARN  - [LaterunHandler:] ~ Late Re-run failed for 
instance sample-process:2015-10-28T03:58Z after 420000 
(AbstractRerunConsumer:84)
java.io.IOException: Failed on local exception: java.io.IOException: 
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]; Host Details : local host is: "sample.host.com/127.0.0.1"; destination 
host is: "sample.host.com":8020; 
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
        at org.apache.hadoop.ipc.Client.call(Client.java:1431)
        at org.apache.hadoop.ipc.Client.call(Client.java:1358)
        ...
Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate 
failed [Caused by GSSException: No valid credentials provided (Mechanism level: 
Failed to find any Kerberos tgt)]
        at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:685)
        ...
{code}

The root cause of the issue is that TGT can expire and the TGT issued by 
authentication server should be renewed periodically. The best location in code 
to do this is when 


> Falcon server loses ability to communicate with HDFS over time
> --------------------------------------------------------------
>
>                 Key: FALCON-1595
>                 URL: https://issues.apache.org/jira/browse/FALCON-1595
>             Project: Falcon
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Balu Vellanki
>            Assignee: Balu Vellanki
>             Fix For: 0.9
>
>         Attachments: FALCON-1595.patch
>
>
> In a kerberos secured cluster where the Kerberos ticket validity is one day, 
> Falcon server eventually lost the ability to read and write to and from HDFS. 
> In the logs we saw typical Kerberos-related errors like "GSSException: No 
> valid credentials provided (Mechanism level: Failed to find any Kerberos 
> tgt)". 
> {code}
> 2015-10-28 00:04:59,517 INFO  - [LaterunHandler:] ~ Creating FS impersonating 
> user testUser (HadoopClientFactory:197)
> 2015-10-28 00:04:59,519 WARN  - [LaterunHandler:] ~ Exception encountered 
> while connecting to the server : javax.security.sasl.SaslException: GSS 
> initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Failed to find any Kerberos tgt)] (Client:680)
> 2015-10-28 00:04:59,520 WARN  - [LaterunHandler:] ~ Late Re-run failed for 
> instance sample-process:2015-10-28T03:58Z after 420000 
> (AbstractRerunConsumer:84)
> java.io.IOException: Failed on local exception: java.io.IOException: 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]; Host Details : local host is: 
> "sample.host.com/127.0.0.1"; destination host is: "sample.host.com":8020; 
>       at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1431)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1358)
>       ...
> Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS 
> initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Failed to find any Kerberos tgt)]
>       at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:685)
>       ...
> {code}
> The root cause of the issue is that TGT can expire and the TGT issued by 
> authentication server should be renewed periodically. The best location in 
> code to do this is in HadoopClientFactory.getFileSystem 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to