[
https://issues.apache.org/jira/browse/HDFS-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429961#comment-13429961
]
Aaron T. Myers commented on HDFS-3637:
--------------------------------------
Thanks a lot for the very thorough review, Eli. Updated patch incoming.
bq. Testing?
In addition to the included automated tests, I've tested this on a 4-node
cluster, reading and writing files, running MR jobs (tera gens/sorts), etc.
I've seen no issues.
bq. What's the latest performance slowdown for the basic HDFS read/write path
with RC4 enabled?
I haven't done a really thorough benchmark, but my testing indicates about a
1.8-2.2x slowdown with RC4, and a much higher slowdown with 3DES. I think this
description of the relative speed of cipher algorithms in Java is pretty
accurate:
http://www.javamex.com/tutorials/cryptography/ciphers.shtml
bq. Seems like DFSOutputStream#newBlockReader in the conf.useLegacyBlockReader
conditional should use a precondition or throw an RTE (eg AssertionError) if
encryptionKey is null, otherwise the client will just consider this a dead DN
and keep trying.
Good point. Changed to a RuntimeException.
bq. In the other case it should blow up if encryptionKey is null right,
otherwise we can have it enabled server side but allow a client not to use it?
Not quite sure what you mean by this. In which case should we blow up if
encryptionKey is null? Note that the client will never be allowed to not use
encryption if the DN is configured to use it. The error message won't be nice,
but no data will ever be transmitted in the clear.
bq. The dfs.encrypt.data.transfer description that this is a server-side config
Done.
bq. Add dfs.encrypt.data.transfer.algorithm with out a default and list two
supported values?
Added the following:
{code}
<property>
<name>dfs.encrypt.data.transfer.algorithm</name>
<value></value>
<description>
This value may be set to either "3des" or "rc4". If nothing is set, then
the configured JCE default on the system is used (usually 3DES.) It is
widely believed that 3DES is more cryptographically secure, but RC4 is
substantially faster.
</description>
</property>
{code}
bq. Shouldn't shouldEncryptData throw an exception if server defaults is null
instead of assume it shouldn't encrypt? Seems more secure, eg if we ever
introduce a bug that results in the NN returning a null server default (should
never happen currently).
No, for compatibility purposes. With the current implementation, an upgraded
client talking to an older server (without encryption support) will correctly
conclude that it does not need to encrypt data. Again, if we ever were to
introduce a bug like you describe, nothing would be sent in the clear, and the
client would blow up eventually.
bq. Consider pulling out the block manager not setting the block pool ID bug to
a separate change?
Sorry, it's not a bug. It's because I changed BlockTokenSecretManager to take
the BlockPoolId at creation time, instead of every time a BlockToken is
created. This is a reasonable change to make since a single
BlockTokenSecretManager cannot actually issue valid BlockTokens for anything
but a single BlockPoolId. Sorry, I should have mentioned this change in my
description of the patch.
bq. Use DFS_BLOCK_ACCESS_TOKEN_LIFETIME_DEFAULT instead of 15s?
This wouldn't be right, since we've lowered the key update interval and token
lifetime earlier in the test. It also needs to be a few multiples of the block
token lifetime, since several block tokens are valid at any given time (the
current and the last two, by default.)
bq. Also perhaps update the relevant NN java doc to indicate that "getting" the
key generates a new key with this timeout.
I called it "getEncryptionKey" to be in keeping with "getDelegationToken". More
appropriate for these would probably be "generate" instead of "get". What are
your thoughts on this?
bq. Jira for supporting encryption or remove this TODO?
Well, since we're sort of phasing out support for RemoteBlockReader, I doubt
such a JIRA will actually ever be implemented. Perhaps we should just remove
the TODO?
bq. Are the sendReadResult write timeout and DFSOutputStream#flush a separate
issue or something introduced here?
It's no functional change - just a refactor so that
RemoteBlockReader2#writeReadResult takes a stream as an argument, instead of
always creating a new stream from the given socket.
> Add support for encrypting the DataTransferProtocol
> ---------------------------------------------------
>
> Key: HDFS-3637
> URL: https://issues.apache.org/jira/browse/HDFS-3637
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: data-node, hdfs client, security
> Affects Versions: 2.0.0-alpha
> Reporter: Aaron T. Myers
> Assignee: Aaron T. Myers
> Attachments: HDFS-3637.patch, HDFS-3637.patch, HDFS-3637.patch
>
>
> Currently all HDFS RPCs performed by NNs/DNs/clients can be optionally
> encrypted. However, actual data read or written between DNs and clients (or
> DNs to DNs) is sent in the clear. When processing sensitive data on a shared
> cluster, confidentiality of the data read/written from/to HDFS may be desired.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira