-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40103/
-----------------------------------------------------------

Review request for Falcon, Ajay Yadava, Sowmya Ramesh, and Venkat Ranganathan.


Bugs: FALCON-1595
    https://issues.apache.org/jira/browse/FALCON-1595


Repository: falcon-git


Description
-------

In a kerberos secured cluster where the Kerberos ticket validity is one day, 
Falcon server eventually lost the ability to read and write to and from HDFS. 
In the logs we saw typical Kerberos-related errors like "GSSException: No valid 
credentials provided (Mechanism level: Failed to find any Kerberos tgt)". 

{code}
2015-10-28 00:04:59,517 INFO  - [LaterunHandler:] ~ Creating FS impersonating 
user testUser (HadoopClientFactory:197)
2015-10-28 00:04:59,519 WARN  - [LaterunHandler:] ~ Exception encountered while 
connecting to the server : javax.security.sasl.SaslException: GSS initiate 
failed [Caused by GSSException: No valid credentials provided (Mechanism level: 
Failed to find any Kerberos tgt)] (Client:680)
2015-10-28 00:04:59,520 WARN  - [LaterunHandler:] ~ Late Re-run failed for 
instance sample-process:2015-10-28T03:58Z after 420000 
(AbstractRerunConsumer:84)
java.io.IOException: Failed on local exception: java.io.IOException: 
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]; Host Details : local host is: "sample.host.com/127.0.0.1"; destination 
host is: "sample.host.com":8020; 
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
        at org.apache.hadoop.ipc.Client.call(Client.java:1431)
        at org.apache.hadoop.ipc.Client.call(Client.java:1358)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
        at com.sun.proxy.$Proxy22.getFileInfo(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
        at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy23.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
        at 
org.apache.falcon.rerun.handler.LateRerunConsumer.detectLate(LateRerunConsumer.java:108)
        at 
org.apache.falcon.rerun.handler.LateRerunConsumer.handleRerun(LateRerunConsumer.java:67)
        at 
org.apache.falcon.rerun.handler.LateRerunConsumer.handleRerun(LateRerunConsumer.java:47)
        at 
org.apache.falcon.rerun.handler.AbstractRerunConsumer.run(AbstractRerunConsumer.java:73)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate 
failed [Caused by GSSException: No valid credentials provided (Mechanism level: 
Failed to find any Kerberos tgt)]
        at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:685)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at 
org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:648)
        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:735)
        at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:373)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1493)
        at org.apache.hadoop.ipc.Client.call(Client.java:1397)
{code}


Diffs
-----

  common/src/main/java/org/apache/falcon/hadoop/HadoopClientFactory.java 
9534ff2 

Diff: https://reviews.apache.org/r/40103/diff/


Testing
-------

end2end testing done on a two node secure cluster. Updated krb5.conf, 
ticket_lifetime set to 1day, renew_lifetime set to 1day. Ran falcon for more 
than a two days and falcon did not have issues accessing hdfs.


Thanks,

Balu Vellanki

Reply via email to