[
https://issues.apache.org/jira/browse/HDFS-13541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470781#comment-16470781
]
Chen Liang commented on HDFS-13541:
-----------------------------------
Thanks a lot for taking a look [~benoyantony]!
bq. We could also think of passing additional parameters based on the connection
This is a good point. The first thought I had is to pass the entire
Server#Connection instance to the resolver, (it also includes the IP address
and the ingress port). Other than these, other potentially relevant fields
might include user, remote, hostAddress (not sure if hostAddress is set
correctly at all though). What do you think?
bq. If so, what prevents the external client from replaying the encrypted
message from a different connection between an internal client and datanode ?
As of now, the main thing that prevents replay attach is the fact that the key
expires after 10 min by default. After the key expires, NN and DN will be using
different keys and the encrypted message is invalidated. Namely, the attacher
has a maximum of 10 min window to reply the encrypted message. We consider this
sufficient as for now. If I understood this correctly, I think it is the same
rationale behind block access token. i.e. without talking to NN, someone may
connect to DN directly replaying block access token, but only possible in that
10 min window.
We considered adding more identification info addition to the QOP string, such
as client IP, or some timestamp based info. This adds more variable to the
message itself. But that also adds more encryption overhead (because the
message is larger). Also, adding IP address might be relatively
straightforward, other info such as timestamp may be very tricky to manage
here. Currently we are inclined not to go this with optimization. Comments on
this?
bq. Another side effect of derived QOP for data transfer protection is that one
cannot enable RPC protection alone with this approach.
This is true as in my current POC. Because in our environment NN and DN always
do the same protection. But we can add configuration's to allow only enforce
RPC protection. We just need to be able to configure DN to ignore the derived
QOP.
bq. As mentioned in the document, Encrypting the entire data pipeline is not
necessary. I believe, it should be optimized
Sure, will work on that.
bq. I prefer the approach where datanode also listens on two ports, as it makes
the entire approach easy to understand
On the implementation complexity, it means we will need to change NN-DN
communication such that DN informs NN about the new port it has. The DN
maintenance code logic seems a bit convoluted now; on the practical side, in
our environment, cross data center traffic actually compose a small fraction of
all traffic, having additional DataXceiverServer thread sitting and listening
on every single datanode, but being idle most of time does not seem to be
ideal. [~shv] may have more comments on this, he is on vacation and until next
week. In the mean time, I will re-evaluate my other POC patch on this
alternative approach.
> NameNode Port based selective encryption
> ----------------------------------------
>
> Key: HDFS-13541
> URL: https://issues.apache.org/jira/browse/HDFS-13541
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode, namenode, security
> Reporter: Chen Liang
> Assignee: Chen Liang
> Priority: Major
> Attachments: NameNode Port based selective encryption-v1.pdf
>
>
> Here at LinkedIn, one issue we face is that we need to enforce different
> security requirement based on the location of client and the cluster.
> Specifically, for clients from outside of the data center, it is required by
> regulation that all traffic must be encrypted. But for clients within the
> same data center, unencrypted connections are more desired to avoid the high
> encryption overhead.
> HADOOP-10221 introduced pluggable SASL resolver, based on which HADOOP-10335
> introduced WhitelistBasedResolver which solves the same problem. However we
> found it difficult to fit into our environment for several reasons. In this
> JIRA, on top of pluggable SASL resolver, *we propose a different approach of
> running RPC two ports on NameNode, and the two ports will be enforcing
> encrypted and unencrypted connections respectively, and the following
> DataNode access will simply follow the same behaviour of
> encryption/unencryption*. Then by blocking unencrypted port on datacenter
> firewall, we can completely block unencrypted external access.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]