-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40103/#review108562
-----------------------------------------------------------

Ship it!


Ship It!

- Venkat Ranganathan


On Nov. 9, 2015, 1 p.m., Balu Vellanki wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40103/
> -----------------------------------------------------------
> 
> (Updated Nov. 9, 2015, 1 p.m.)
> 
> 
> Review request for Falcon, Ajay Yadava, Sowmya Ramesh, and Venkat Ranganathan.
> 
> 
> Bugs: FALCON-1595
>     https://issues.apache.org/jira/browse/FALCON-1595
> 
> 
> Repository: falcon-git
> 
> 
> Description
> -------
> 
> In a kerberos secured cluster where the Kerberos ticket validity is one day, 
> Falcon server eventually lost the ability to read and write to and from HDFS. 
> In the logs we saw typical Kerberos-related errors like "GSSException: No 
> valid credentials provided (Mechanism level: Failed to find any Kerberos 
> tgt)". 
> 
> {code}
> 2015-10-28 00:04:59,517 INFO  - [LaterunHandler:] ~ Creating FS impersonating 
> user testUser (HadoopClientFactory:197)
> 2015-10-28 00:04:59,519 WARN  - [LaterunHandler:] ~ Exception encountered 
> while connecting to the server : javax.security.sasl.SaslException: GSS 
> initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Failed to find any Kerberos tgt)] (Client:680)
> 2015-10-28 00:04:59,520 WARN  - [LaterunHandler:] ~ Late Re-run failed for 
> instance sample-process:2015-10-28T03:58Z after 420000 
> (AbstractRerunConsumer:84)
> java.io.IOException: Failed on local exception: java.io.IOException: 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]; Host Details : local host is: 
> "sample.host.com/127.0.0.1"; destination host is: "sample.host.com":8020; 
>       at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1431)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1358)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>       at com.sun.proxy.$Proxy22.getFileInfo(Unknown Source)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
>       at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>       at com.sun.proxy.$Proxy23.getFileInfo(Unknown Source)
>       at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
>       at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
>       at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
>       at 
> org.apache.falcon.rerun.handler.LateRerunConsumer.detectLate(LateRerunConsumer.java:108)
>       at 
> org.apache.falcon.rerun.handler.LateRerunConsumer.handleRerun(LateRerunConsumer.java:67)
>       at 
> org.apache.falcon.rerun.handler.LateRerunConsumer.handleRerun(LateRerunConsumer.java:47)
>       at 
> org.apache.falcon.rerun.handler.AbstractRerunConsumer.run(AbstractRerunConsumer.java:73)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS 
> initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Failed to find any Kerberos tgt)]
>       at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:685)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>       at 
> org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:648)
>       at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:735)
>       at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:373)
>       at org.apache.hadoop.ipc.Client.getConnection(Client.java:1493)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1397)
> {code}
> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/falcon/hadoop/HadoopClientFactory.java 
> 9534ff2 
> 
> Diff: https://reviews.apache.org/r/40103/diff/
> 
> 
> Testing
> -------
> 
> end2end testing done on a two node secure cluster. Updated krb5.conf, 
> ticket_lifetime set to 1day, renew_lifetime set to 1day. Ran falcon for more 
> than a two days and falcon did not have issues accessing hdfs.
> 
> 
> Thanks,
> 
> Balu Vellanki
> 
>

Reply via email to