[
https://issues.apache.org/jira/browse/HADOOP-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15836483#comment-15836483
]
Daryn Sharp commented on HADOOP-13836:
--------------------------------------
SSL is notoriously hard for non-blocking io. The added org.baswerc.niossl
library appears dead and unsupported. V0.2 was posted 1.5y ago. There are a
handful of open bugs about memory leaks, selectors not being reliable, etc. No
responses to the bugs. Not encouraging.
Regarding the Connection subclasses. The readAndProcess method is already a bit
dicey. It embodies authentication handshake and general rpc message reading
and queuing. I'm hesitant of two different impls because it’s likely lead to
unintended divergence, as illustrated below, but also increasing the chance of
security holes. Ideally the ssl channel impl should be transparent and not
require changes to readAndProcess.
Verifying correctness of partial reads is a bit difficult. The position within
the byte[] appBufBytes, which is extracted from ByteBuffer appBuf, is being
tracked via appBuf’s position and repeatedly updated. The extraction seems
unnecessary and explicitly position update seem unnecessary. It’d be easier to
follow if applicationBufferRead() took a source/dest byte buffer, copied up to
dest’s remaining from the source, updated the position. That said…
The ssl readAndProcess behavior isn’t equivalent to the current NIO behavior:
read only what’s available, process request when fully read. If the ssl
version encounters a partial payload, it loops until at least the full payload
is read into appBuf. If appBuf isn’t fully consumed it loops again. This
causes problems that NIO is avoiding:
* Multi-threaded clients generating requests faster than read will indefinitely
tie up a reader.
* Clients sending a slow trickle of bytes will tie up a reader until a request
is fully read.
* Clients stalled mid-request will cause the reader to go into a spin loop.
When the reader loops on a connection, the reader’s other established
connections are starved. The reader also isn’t consuming new connections
queued by the listener. Eventually the listener will block and stop accepting.
The result is the ipc layer going into a series of seizures that severely
degrade performance. This may be partly responsible for the performance
degradation.
Another issue is the all or nothing requirement for enabling ssl. I’d be
potentially interested in using ssl if I could configure which hosts require
ssl, ie. intra-colo. Sasl qop impl allows the server to selectively control if
clients are forced to encrypt. The same would be nice for ssl.
Lastly, the 11-14% performance is unacceptable for production use. I would
have expected terasort’s heavy cpu usage to eclipse the ssl rpc load. It
didn’t, which implies that less cpu-intensive jobs will have a more pronounced
hit?
> Securing Hadoop RPC using SSL
> -----------------------------
>
> Key: HADOOP-13836
> URL: https://issues.apache.org/jira/browse/HADOOP-13836
> Project: Hadoop Common
> Issue Type: New Feature
> Components: ipc
> Reporter: kartheek muthyala
> Assignee: kartheek muthyala
> Attachments: HADOOP-13836.patch, HADOOP-13836-v2.patch,
> HADOOP-13836-v3.patch, HADOOP-13836-v4.patch, Secure IPC OSS Proposal-1.pdf,
> SecureIPC Performance Analysis-OSS.pdf
>
>
> Today, RPC connections in Hadoop are encrypted using Simple Authentication &
> Security Layer (SASL), with the Kerberos ticket based authentication or
> Digest-md5 checksum based authentication protocols. This proposal is about
> enhancing this cipher suite with SSL/TLS based encryption and authentication.
> SSL/TLS is a proposed Internet Engineering Task Force (IETF) standard, that
> provides data security and integrity across two different end points in a
> network. This protocol has made its way to a number of applications such as
> web browsing, email, internet faxing, messaging, VOIP etc. And supporting
> this cipher suite at the core of Hadoop would give a good synergy with the
> applications on top and also bolster industry adoption of Hadoop.
> The Server and Client code in Hadoop IPC should support the following modes
> of communication
> 1. Plain
> 2. SASL encryption with an underlying authentication
> 3. SSL based encryption and authentication (x509 certificate)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]