[
https://issues.apache.org/jira/browse/HADOOP-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14235887#comment-14235887
]
Tobi Vollebregt commented on HADOOP-8828:
-----------------------------------------
Any progress on this?
I ran into a variant of this recently when using HBase ExportSnapshot to copy
from a secure to an insecure cluster. I was able to workaround using a custom
WebHDFS implementation for the insecure cluster, that I used using
{{-Dfs.webhdfs.impl=<classpath>.SimpleAuthWebHdfsFileSystem}}:
{code:java}
public class SimpleAuthWebHdfsFileSystem extends WebHdfsFileSystem {
@Override public Token<DelegationTokenIdentifier> getDelegationToken(String
renewer)
throws IOException {
return null;
}
}
{code}
Secure cluster is running Hadoop 2.3.0 / CDH 5.1.2
Insecure cluster is running Hadoop 2.0.0 / CDH 4.6.0
> Support distcp from secure to insecure clusters
> -----------------------------------------------
>
> Key: HADOOP-8828
> URL: https://issues.apache.org/jira/browse/HADOOP-8828
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Eli Collins
> Assignee: Haohui Mai
>
> Users currently can't distcp from secure to insecure clusters.
> Relevant background from ATM:
> There's no plumbing to make the HFTP client use AuthenticatedURL in the case
> security is enabled. This means that even though you have the servlet filter
> correctly configured on the server, the client doesn't know how to properly
> authenticate to that filter.
> The crux of the issue is that security is enabled globally instead of
> per-file system. The trick of using HFTP as the source FS works when the
> source is insecure, but not the source is secure.
> Normal cp with two hdfs:// URL can be made to work. There is indeed logic in
> o.a.h.ipc.Client to fall back to using simple authentication if your client
> config has security enabled (hadoop.security.authentication set to
> "kerberos") and the server responds with a response for simple
> authentication. Thing is, there are at least 3 bugs with this that I bumped
> into. All three can be worked around.
> 1) If your client config has security enabled you *must* have a valid
> Kerberos TGT, even if you're interacting with an insecure cluster. The hadoop
> client unfortunately tries to read the local ticket cache before it tries to
> connect to the server, and so doesn't know that it won't need Kerberos
> credentials.
> 2) Even though the destination NN is insecure, it has to have a Kerberos
> principal created for it. You don't need a keytab, and you don't need to
> change any settings on the destination NN. The principal just needs to exist
> in the principal database. This is again because the hadoop client will,
> before connecting to the remote NN, try to get a service ticket for the
> hdfs/f.q.d.n principal for the remote NN. If this fails, it won't even get to
> the part where it tries to connect to the insecure NN and falls back to
> simple auth.
> 3) Once you get through problems 1 and 2, you will try to connect to the
> remote, insecure NN. This will work, but the reported principal name of your
> user will include a realm that the remote NN doesn't know about. You will
> either need to change the default_realm setting in /etc/krb5.conf on the
> insecure NN to be the same as the secure NN, or you will need to add some
> custom hadoop.security.auth_to_local mappings on the insecure NN so it knows
> how to translate this long principal name into a short name.
> Even with all these changes, distcp still won't work since the first thing it
> tries to do when submitting the job is to get a delegation token for all the
> involved NNs, which won't work since the insecure NN isn't running a DT
> secret manager. I haven't been able to figure out a way around this, except
> to make a custom distcp which doesn't necessarily do this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)