[
https://issues.apache.org/jira/browse/HDFS-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon updated HDFS-3972:
------------------------------
Attachment: hdfs-3972.txt
The root cause of the issue is that, in HA, the TrashEmptier is started from
the context of an RPC, rather than the context of the NN startup code. So, it
was running as the administrator who called transitionToActive (remote
non-keytab user) rather than the context of the NN login user.
Attached patch fixes the issue. It doesn't include a new unit test, since the
issue is only on secure clusters, which aren't automatable.
I tested manually as follows:
- configure a secure HA cluster
- configure fs.trash.interval and fs.trash.checkpoint.interval = 1 in
core-site.xml
- started NN and transitioned to active
-- without the patch, I saw the error mentioned above. with the patch, no error
- created and deleted a file, which got moved to trash
- waited a minute, and saw that the trash checkpointer moved the file correctly
-- without the patch, I saw the error again. With the patch, it worked.
> Trash emptier fails in secure HA cluster
> ----------------------------------------
>
> Key: HDFS-3972
> URL: https://issues.apache.org/jira/browse/HDFS-3972
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 3.0.0, 2.0.1-alpha
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Critical
> Attachments: hdfs-3972.txt
>
>
> In a secure HA cluster, we're seeing the following issue on the NN when the
> trash emptier tries to run:
> WARN org.apache.hadoop.fs.TrashPolicyDefault: Trash can't list homes:
> java.io.IOException: Failed on local exception: java.io.IOException:
> javax.security.sasl.SaslException: GSS initiate failed [Caused by
> GSSException: No valid credentials provided (Mechanism level: Failed to find
> any Kerberos tgt)]; Host Details : local host \
> is: "xxxxx"; destination host is: "xxxx":8020; Sleeping.
> The issue seems to be that the trash emptier thread sends RPCs back to
> itself, but isn't wrapped in a doAs. Credit goes to Stephen Chu for
> discovering this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira