[ 
https://issues.apache.org/jira/browse/HDFS-13541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470781#comment-16470781
 ] 

Chen Liang commented on HDFS-13541:
-----------------------------------

Thanks a lot for taking a look [~benoyantony]! 
bq. We could also think of passing additional parameters based on the connection
This is a good point. The first thought I had is to pass the entire 
Server#Connection instance to the resolver, (it also includes the IP address 
and the ingress port). Other than these, other potentially relevant fields 
might include user, remote, hostAddress (not sure if hostAddress is set 
correctly at all though). What do you think?

bq. If so, what prevents the external  client from replaying the encrypted 
message from a different connection between an internal client and datanode ?
As of now, the main thing that prevents replay attach is the fact that the key 
expires after 10 min by default. After the key expires, NN and DN will be using 
different keys and the encrypted message is invalidated. Namely, the attacher 
has a maximum of 10 min window to reply the encrypted message. We consider this 
sufficient as for now. If I understood this correctly, I think it is the same 
rationale behind block access token. i.e. without talking to NN, someone may 
connect to DN directly replaying block access token, but only possible in that 
10 min window. 

We considered adding more identification info addition to the QOP string, such 
as client IP, or some timestamp based info. This adds more variable to the 
message itself. But that also adds more encryption overhead (because the 
message is larger). Also, adding IP address might be relatively 
straightforward, other info such as timestamp may be very tricky to manage 
here. Currently we are inclined not to go this with optimization. Comments on 
this?

bq. Another side effect of derived QOP for data transfer protection is that one 
cannot enable RPC protection alone with this approach.
This is true as in my current POC. Because in our environment NN and DN always 
do the same protection. But we can add configuration's to allow only enforce 
RPC protection. We just need to be able to configure DN to ignore the derived 
QOP.

bq. As mentioned in the document, Encrypting the entire data pipeline is not 
necessary. I believe, it should be optimized
Sure, will work on that.

bq. I prefer the approach where datanode also listens on two ports, as it makes 
the entire approach easy to understand
On the implementation complexity, it means we will need to change NN-DN 
communication such that DN informs NN about the new port it has. The DN 
maintenance code logic seems a bit convoluted now; on the practical side, in 
our environment, cross data center traffic actually compose a small fraction of 
all traffic, having additional DataXceiverServer thread sitting and listening 
on every single datanode, but being idle most of time does not seem to be 
ideal. [~shv] may have more comments on this, he is on vacation and until next 
week. In the mean time, I will re-evaluate my other POC patch on this 
alternative approach.

> NameNode Port based selective encryption
> ----------------------------------------
>
>                 Key: HDFS-13541
>                 URL: https://issues.apache.org/jira/browse/HDFS-13541
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, namenode, security
>            Reporter: Chen Liang
>            Assignee: Chen Liang
>            Priority: Major
>         Attachments: NameNode Port based selective encryption-v1.pdf
>
>
> Here at LinkedIn, one issue we face is that we need to enforce different 
> security requirement based on the location of client and the cluster. 
> Specifically, for clients from outside of the data center, it is required by 
> regulation that all traffic must be encrypted. But for clients within the 
> same data center, unencrypted connections are more desired to avoid the high 
> encryption overhead. 
> HADOOP-10221 introduced pluggable SASL resolver, based on which HADOOP-10335 
> introduced WhitelistBasedResolver which solves the same problem. However we 
> found it difficult to fit into our environment for several reasons. In this 
> JIRA, on top of pluggable SASL resolver, *we propose a different approach of 
> running RPC two ports on NameNode, and the two ports will be enforcing 
> encrypted and unencrypted connections respectively, and the following 
> DataNode access will simply follow the same behaviour of 
> encryption/unencryption*. Then by blocking unencrypted port on datacenter 
> firewall, we can completely block unencrypted external access.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to