[
https://issues.apache.org/jira/browse/HADOOP-10768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16278871#comment-16278871
]
Daryn Sharp commented on HADOOP-10768:
--------------------------------------
Sorry for how long it's taken to review.
Implementing the crypto handling via a sasl module wrapping the underlying
module for authentication purposes but handling the QOP itself would be a
cleaner design. This would keep the code simpler and support all
current/future auth types instead of just digest/token-based connections.
The key exchange should use DH or ECDH. Using weaker encryption for a
rudimentary key exchange is definitely counterproductive. Refer to the JCA
docs
(https://docs.oracle.com/javase/8/docs/technotes/guides/security/crypto/CryptoSpec.html).
Forcing AES-CTR as the only option is an artificial restriction based on the
hadoop crypto streams. The standard javax {{Cipher}} may be a better option
since it supports arbitrary encryption and thus more future proof. Since java
7u40, {{Cipher}} supposedly uses native intrinsics. It would be interesting
know if it is now as performant.
Should use AES-GCM for integrity (which {{Cipher}} provides) since that’s its
purpose and it uses NI. Should alleviate the cited hmac performance penalty.
Manual crypto is hard for mere mortals to do correctly – which is why I didn’t
bother to scrutinize it. The manual hmac impl appears to spew garbage
everywhere which probably contributes to the severe performance penalty. I’m
actually surprised it works “as well” as it does which gives me hope we can
squeeze out decent performance when using the correct algos.
Using {{Cipher}} would also allow the sasl negotiate to include arbitrary
cipher strings, similar to the sasl instance type, to facilitate using new
ciphers w/o changing protobufs and enums. Negates the need to move/change the
hdfs code.
If {{Cipher}} isn’t/can’t be an option, we should consider exposing the hadoop
cipher engine that glues onto openssl and make it support more algos.
––
*General*
The changes to {{CryptoInputStream}} are orthogonal to rpc improvements and
should be a separate jira. In any case, the new {{readFully}} will go into an
infinite loop on a non-blocking read that doesn’t fill the buffer.
A general comment is no need to null check refs prior to {{instanceof}} since
it returns false if the ref is null. Those changes are also unnecessary.
The {{ipc.Client}} change is unnecessary.
> Optimize Hadoop RPC encryption performance
> ------------------------------------------
>
> Key: HADOOP-10768
> URL: https://issues.apache.org/jira/browse/HADOOP-10768
> Project: Hadoop Common
> Issue Type: Improvement
> Components: performance, security
> Affects Versions: 3.0.0-alpha1
> Reporter: Yi Liu
> Assignee: Dapeng Sun
> Attachments: HADOOP-10768.001.patch, HADOOP-10768.002.patch,
> HADOOP-10768.003.patch, HADOOP-10768.004.patch, HADOOP-10768.005.patch,
> HADOOP-10768.006.patch, HADOOP-10768.007.patch, HADOOP-10768.008.patch,
> Optimize Hadoop RPC encryption performance.pdf
>
>
> Hadoop RPC encryption is enabled by setting {{hadoop.rpc.protection}} to
> "privacy". It utilized SASL {{GSSAPI}} and {{DIGEST-MD5}} mechanisms for
> secure authentication and data protection. Even {{GSSAPI}} supports using
> AES, but without AES-NI support by default, so the encryption is slow and
> will become bottleneck.
> After discuss with [~atm], [~tucu00] and [~umamaheswararao], we can do the
> same optimization as in HDFS-6606. Use AES-NI with more than *20x* speedup.
> On the other hand, RPC message is small, but RPC is frequent and there may be
> lots of RPC calls in one connection, we needs to setup benchmark to see real
> improvement and then make a trade-off.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]